By Vishal Krishna.
Feifei Li, President of Database Systems at Alibaba, feels that AI, which now is limited to computer vision and speech/voice recognition currently, will play a huge role in business operations in the future.
Alibaba and Jack Ma are household names today. The tech giant has a market cap among the global top 10 and has expanded into all major markets in the world. Technology has driven Alibaba's superlative growth, and has helped the company do what Amazon, Google, eBay, PayPal, FedEx, as well as wholesalers and umpteen manufacturers do in the US. Now, the world over, companies are adopting and adapting to artificial intelligence and the future of business. And Alibaba is leading the way.
Fostering the tech behemoth's growth is Feifei Li, the Vice President of the Alibaba Group and President of Database Systems at Alibaba. Before joining the tech conglomerate, whose revenues are $56.15 billion, Feifei was a professor at University of Utah, US. Feifei believes AI is "yet to make a big impact" and remains limited to heuristics like computer vision, speech, and voice recognition. But, he feels that one thing's crystal clear: AI will in time play a big role in the way business is conducted.
Feifei Li, Vice President of Alibaba Group and President of Database Systems at the tech conglomerate.
YourStory caught up with Feifei Li for a candid conversation on the future of databases, what engineers need to learn to power automated databases, and what the company has to offer to data scientists.
The database is a mature technology and has been around for 40 years, especially relational databases. I feel like a dinosaur. That's part of the reason why this conversation is important and exciting. You know what happened to dinosaurs, right? They went extinct. So, how does one evolve in the tech world and not become extinct? The cloud has provided several opportunities, and there are several cloud-native database companies that can compete with the likes of Oracle. The future of database technologies are cloud native databases.
But, not many people realize that the cloud was in its conception originally a virtualization of resources such as storage and computing resources. These resources are bundled as a pool and sold as infrastructure-as-a-service. This is amazing because the cloud is elastic and easily scalable, and the reason why you see the proliferation of new startups. With the cloud, instead of working with fixed costs, you can work on a pool of resources with a variable cost. That's why business conversations are now about elasticity and high availability. You can be highly available if you are in the cloud. There will be zero downtime for your services.
Now coming back to a cloud-native database. Cloud-native database systems have been around since 2005. Storage, network, and virtualization were the first disruptive technologies to take off as cloud service offerings. After that, a lot of changes happened in the platform layer with algorithms coming in by 2014. Tech disruption happens layer by layer, so a database is no longer legacy. In a traditional database, resources (specifically, storage and computing) are bundled together and you cannot tap the power of pooled resources.
Our in-house database management system, POLARDB, decouples the computing and storage resources of a database. This kind of architecture benefits companies to scale up or scale back down for storage and compute. You can manage the CPU or database, DB, through a button. It is automated. At Alibaba, we have the Auto Scaler. You can automate and monitor workloads without having people to do tasks. It is on demand and elastic, which means businesses save on cost. It includes even NewSql.
Jargon and terminology apart, I have to explain this technically and talk about old structured data relational database management systems. Earlier, a big part of the database business needed to ensure consistency and durability guarantees. This meant ensuring that updates were consistent. To make sure performance was consistent, you needed systems to manage high through put workloads and ensure consistency.
Google changed all this 10 years ago, however. Their belief was that this old model could not work with new applications that generated massive amounts of data. The world needed the availability of databases rather than a durability guarantee.
Businesses in the modern world needed a highly scalable database unlike those that offered a structured approach of working with data. A decade ago, rather than worrying about traditional requirements of consistency, it was important to scale horizontally with distributed solutions while handling massive data. That's how big data processing tools like MapReduce and Hadoop were born.
This also gave rise to NewSql systems, which came in around a decade ago and allowed handling massive amounts of data from decoupled resources in the cloud. A company could scale from 100 nodes to 1,000 nodes in seconds, like ecommerce companies during a sale where traffic spikes suddenly. Alibaba has a partnership with MongoDB, who offered NewSql technologies.
NewSql is not just about scale. It also gives you guarantees of consistency like a relational database. It has the best-distributed and cloud-native architectures. We also have a hybrid database management system where we can run databases instances running on premise systems and in the cloud.
Yes, our product, Data Lake Analytics (DLA), combines data from all sources of legacy and cloud infrastructure. With the DLA data from file systems, relational databases and NewSql can be pulled into our data lake and can create an interactive analytical processing capability. These analytical databases combine processing of structured and unstructured data on a large scale. This helps data scientists use machine learning algorithms to understand structured and unstructured data together.
The work experience with data will be much better than it was in the past. The productivity of data scientists is boosted because they don't have to spend too much time structuring data. We also have a product, Data Works, which has several machine learning algorithms to help data scientists make sense of data.
This answer to this question is a bit complex. Cloud computing has changed everything because it has fuelled the growth of data. But, we are still far from real AI. We use deep neural networks today and they need large-scale data to be really useful. AI is a black box today, but AI tech used as heuristics has worked. It has made a mark in computer vision, and speech recognition.
Now, it is making a mark in databases too. We will have self-driving databases in the future, and our roadmap is to fully automate a database. The complexity in automating databases arises because usage changes from customer to customer, which makes it tough to automate the entire process. However, we can use AI for common scenarios. For example, we can help ecommerce or traditional systems to manage their latency and scalability and use algorithms to ensure that databases are secure and running fine.
We are integrating blockchain with database systems with Ledger DB. This can syndicate and verify the integrity of data and logs. Closely associated with Alibaba Group, Ant Financial has AliPay, which happens to use blockchain. There is a strong ecosystem and when you transfer money from one party to another, the company uses blockchain to track the integrity of transactions between banks and merchants.
Engineers and developers need not worry about which computer language they know. They have to know open source technologies and languages, because we will never build our systems using closed technologies. If you are a database engineer, you don't have to learn new things. Postgres or Oracle DB will do. At Alibaba, you need fundamental math skills and logic.
Original Source: https://yourstory.com/2020/02/self-driving-databases-future-ai-alibaba?utm_pageloadtype=scroll
The Underlying Logic of ApsaraDB for POLARDB, an Industry-leading Database
Alibaba Clouder - February 3, 2021
Alibaba Clouder - April 22, 2020
ApsaraDB - April 3, 2019
Alibaba Clouder - June 20, 2018
ApsaraDB - July 3, 2019
Alex - June 18, 2020
A platform that provides enterprise-level data modeling services based on machine learning algorithms to quickly meet your needs for data-driven operations.
Learn MoreThis technology can be used to predict the spread of COVID-19 and help decision makers evaluate the impact of various prevention and control measures on the development of the epidemic.
Learn MoreOffline SDKs for visual production, such as image segmentation, video segmentation, and character recognition, based on deep learning technologies developed by Alibaba Cloud.
Learn MoreSelf-service network O&M service that features network status visualization and intelligent diagnostics capabilities
Learn MoreMore Posts by ApsaraDB