How to Optimize Data Queries for Time Series Database

Data merging and data cleaning are required in many scenarios. We can use window query for this kind of operation, but how can we make it faster and quickly retrieve batch data?

Here is a quick summary of the common methods of optimizing time sequence data querying:

Recursion is used when there are few unique values and an unknown range.
Use subquery when the number of unique values is relatively small and you know the specific range of the unique values.
Window query is more appropriate than the above method when there are many unique values.
However, stream computing is even better in the same scenarios.

Efficiency Comparison Table

Data volume	Number of Unique Values	Window Query (ms)	Subquery (ms)	Recursive Query (ms)
5 million	1 million	6,446	2,892	6,706
5 million	1,000	6,176	7	9

PostgreSQL is the best choice in open-source databases as it provides several solutions to the same problems. It leaves you free to choose the most appropriate solution for you and your individual needs.

Use recursion when the number of unique values is relatively small and the range of the unique values is unknown.
Use subquery when the number of unique values is relatively small and the range of the unique values is determined. For example, if the total range is 1 million pieces of data, but only 500,000 pieces of data are included in this batch, then the performance is optimal if you have the IDs for these 500,000 entries. Otherwise you need to scan 1 million pieces of data. Another example is that there are a total of 100 million users, but an interval includes only tens of thousands of active users.
Window query is more appropriate if the number of unique values is relatively large.
Steaming computing is better than method 3 if the number of unique values is relatively large.

For the detailed comparison information for recursion, subquery and window queries, please see Optimizing Time Series Querying on Alibaba Cloud RDS for PostgreSQL.

Related Products

Data Lake Analytics

Data Lake Analytics does not require any ETL tools. This service allows you to use standard SQL syntax and business intelligence (BI) tools to efficiently analyze your data stored in the cloud with extremely low costs.

Cloud native federation analytics across multiple data sources: OSS, PostgreSQL, MySQL (RDS), NoSQL (Table Store), etc.

Alibaba Cloud Databases

Alibaba Cloud offers fully managed database services. We monitor, backup, and recover your database automatically so that you can fully focus on your business development. To provide more stable and scalable database services, Alibaba Cloud optimized the source code based on the open-source database engines. Our database services, such as ApsaraDB RDS for MySQL and ApsaraDB RDS for PPAS have lower risk compare to the other services using the community edition.

Community

How to Optimize Data Queries for Time Series Database

Related Blog Posts

Accelerating PostgreSQL Ad Hoc Query and Dictionary with RUM Index

Supporting 200 Billion Data Records in a Single RDS PostgreSQL Instance

Related Documentation

Locate the SQL statements with the greatest resource consumption - ApsaraDB RDS for PostgreSQL

Create a PostgreSQL schema

Related Products

Data Lake Analytics

Alibaba Cloud Databases

Read previous post:

Read next post:

Alibaba Clouder

You may also like

Comments

Alibaba Clouder

Related Products

PolarDB for MySQL

PolarDB for PostgreSQL

Database for FinTech Solution

Oracle Database Migration Solution