Alibaba Cloud AnalyticDB for PostgreSQL is a massively parallel processing based data warehouse service on Alibaba Cloud. Its kernel was evolved from the Greenplum open-source database based on the PostgreSQL engine, with improved analysis performance. The column-store meta scan function is key to improving database query performance.
AnalyticDB for PostgreSQL supports column stores and features high data compression performance and superior query performance. However, given that the database system reads data from the entire column or creates a B-Tree index when processing query criteria with a high filter rate, two constraints affect indexing. The first is that indexes are not compressed, which can lead to data bloat. The other constraint is that large result sets lead to index failure and a higher index cost than that of tablescan.
To combat these constraints, AnalyticDB for PostgreSQL provides the meta scan function, which features excellent filter performance and extremely small storage usage.
As a further enhancement of column store tables, the meta scan function collects the maximum and minimum values from the columns of a column store table to implement a filtering function similar to indexing based on the block-offset mechanism. The metadata stores the maximum value, minimum value, and offset, all of which are collected from each block of columns. The maximum and minimum values are used for conditional filtering. The block offset is used to skip the blocks that do not meet filter conditions. This minimizes the I/O of reading column stores and the CPU consumption of block decompression, ultimately improving query performance. In a TPC-H 1 TB test, query performance increased by 29% and can be increased by up to 500% in queries based on conditions with a high filter rate, such as Q06, Q12, Q14, and Q15 of TPC-H.
In AnalyticDB for PostgreSQL, the column-store meta scan function is used to filter out non-compliant blocks based on the maximum and minimum values of row group and batch group metadata so that only compliant blocks need to be read. This reduces the scan I/O and the CPU consumption during block decompression. Therefore, the meta scan function can significantly improve the performance of queries with strong filter conditions.
AnalyticDB for PostgreSQL is built on the open-source databases PostgreSQL and Greenplum, and is maintained by the ApsaraDB OLAP database team. In the near future, Alibaba Cloud will make all of the functions connected with AnalyticDB for PostgreSQL will be made open source as a contribution to the open-source community.
Cloud-based data services are playing a vital role in providing a large and powerful platform to store, manage, and analyze lots of unstructured data for IoT applications. These services are designed to facilitate massive volumes of unprocessed, non-relational data and to perform data analytics, control, and monitoring tasks in an efficient and cost-effective way, which would not be possible otherwise. Moreover, the complexity of data keeps varying, such as the conclusion extracted from the unprocessed data, which was received as input to these IoT devices is mainly in the form of trends, patterns, and statistical visuals to help businesses behave proactively to facilitate their users. Think of this from an automobile's perspective, who might have to keep a track of their systems that are installed in thousands (if not millions) of vehicles on the road.
The technology stack for IoT specific applications is vast, from automating the coffee machine as per the user preference, all the way to the development of smart devices for home automation and automobile industries. Therefore, these cloud-driven data services are able to meet a diversified range of ever growing data requirements.
So, this article focuses on how PostgreSQL by ApsaraDB designed and supported by Alibaba Cloud meets the requirements of IoT applications while maintaining the needs of varying data structures and datasets. Furthermore, we will discuss how IoT clusters are formed and designed to cater to these essential data requirements received from different connected devices.
Here is a default purchase page that you see when you are going to buy the product:
As this screenshot demonstrates, the product purchase page only asks for the basic information, most of the operational headache is taken care of by the infrastructure. Take the "Edition" as an example, the "High-availability" configures the underlying topology for the resources—master/slave nodes, and backups. Same applies to other settings, such as the networks to deploy this database in.
So far, we have discussed the significance of using cloud service providers to meet the database challenges for IoT applications. In this section, we will highlight why Alibaba Cloud PostgreSQL is the most suitable and reliable database service to meet the super dynamic requirements associated with IoT devices and applications—not only as compared to other cloud-based database system but also when it comes to the relational databases offered by Alibaba Cloud ApsaraDB too. And then why Alibaba Cloud is the most viable cloud service to fulfill these intensified capabilities of Postgres by leveraging the streamline features of a cloud platform. However, if you want to have an overview of the basic functionalities of PostgreSQL supported by Alibaba Cloud ApsaraDB before moving further to explore Postgres capabilities for IoT applications, refer this handy and explanatory documentation - ApsaraDB RDS for PostgreSQL—or read my previous post that explore the deployment options made available on Alibaba Cloud for PostgreSQL.
ApsaraDB AnalyticDB for PostgreSQL is an online MPP (Massively Parallel Processing) data warehousing service based on the open source Greenplum Database.
ApsaraDB AnalyticDB for PostgreSQL provides online expansion and performance monitoring service to free your team from complicated MPP cluster operations and management (O&M). This enables database administrators, developers and data analysis to focus on upgrading enterprise productivity through SQL development.
An on-demand database hosting service for PostgreSQL with automated monitoring, backup and disaster recovery capabilities
Scenarios
Benefits
ApsaraDB for RDS allows you to quickly build a stable and reliable database system. It has the following advantages compared with user-created databases:
ApsaraDB for RDS has significant advantages in cost-effectiveness, availability, reliability, ease of use, and performance. Its cost is one third of the expense to build databases on ECS instances and one tenth of the expense to build databases on physical servers.
2,599 posts | 762 followers
FollowApsaraDB - March 12, 2020
ApsaraDB - February 25, 2021
Alibaba Clouder - May 20, 2020
Alibaba Clouder - January 25, 2021
Alibaba Clouder - November 11, 2020
Alibaba Clouder - July 29, 2019
2,599 posts | 762 followers
FollowAn online MPP warehousing service based on the Greenplum Database open source program
Learn MoreAn on-demand database hosting service for PostgreSQL with automated monitoring, backup and disaster recovery capabilities
Learn MoreMore Posts by Alibaba Clouder
Dikky Ryan Pratama May 6, 2023 at 2:08 pm
Dear Alibaba Clouder ,I wanted to express my heartfelt gratitude for your incredible article on Column-Store Meta Scans and Provisioning IoT Clusters . I was truly impressed by your insightful analysis and the way you presented the information in a clear and concise manner. Your writing style is engaging and it kept me hooked from beginning to end.What I particularly appreciated about your article was how you highlighted the importance of cloud storage service. Your perspective on this issue is thought-provoking and has given me a new understanding of the subject. I believe that your article will inspire many readers to think critically about this topic and to take action to make a positive change.Once again, thank you for sharing your knowledge and expertise through your excellent article. Your work is an inspiration to many and I look forward to reading more of your writing in the future.Best regards,