PostgreSQL Swinging Door Trending (SDT) Algorithm for Stream Computing Applications

By Digoal

In stream computing applications, retaining the window and data recalculability-which is, to put in simpler terms, the ability to set the start point to compute again-as well as the overwriting of existing computing results are two special but very useful requirements. What does all of this mean? Well, consider a few examples to better understand these requirements.

Consider this issue. Streaming data comes from multiple business lines and is generated at different times. However, due to system defects, the data arrival sequence may be incorrect. For example, data that was generated earlier may actually arrive later, rather than the chronological sequence that would be expected.

Now, here's a more real-worldish example from an e-commerce scenario: a customer places an order first, then pays for the order, and then, as the final step, good or service that the customer bought is delivered. The related data is generated by multiple business lines-one for the order placing, another for the payment, and another for the shipping or delivery of the order.

These should come into the system in the same chronological order that they occurred in reality: order, then payment, and then delivery. However, the actual data arrival sequence may be, and often is, as follows: payment, then order, and then delivery. This incorrect data arrival sequence affects the stream computing accuracy and can be corrected through recalculation.

Now to get back to the requirements above. These capabilities can help fix these sorts of issues. Let's explain a few things. Recalculation can be abstracted into the following stream computing functions:

The next computation may depend on the result of the previous computation.
The source data used in the previous computation may be used for the next computation.
The next computation may change the result of the previous computation.
Recalculation can start at a previous time point. The computation result at the previous time point also needs to be obtained.

To satisfy PostgreSQL's User-Defined Functions (UDFs), async batch consume with atomicity, transaction atomicity, result persistence, support for INSERT ON CONFLICT write merging, and flash back (rolling back computing results to a certain time point) can be used to address the preceding requirements.

This article uses SDT as an example to provide a design suggestion and demo for this type of stream computing in PostgreSQL.

Demo

1. Create an input table.
2. Create a window retention table.
3. Create a result table.
4. Write data into the input table (enable pgbench, stress testing, and real-time writing).
5. Compile the batch consumption and SDT functions (PLpgSQL).

Including the recalculation logic:

Result table without records  
  
  Input table. Delete up to N records from the record set R1 in chronological order.  
  
  Window table. Insert R1.  
  
  Compute R1 and write the result to the result table.  
  
Result table with records (check whether any results need to be recalculated)  
  
  Window table. Retrieve the minimum time T2 and the maximum time T3.  
  
  Input table. Directly delete data if the data < T2.  
  
  Input table. Retrieve the minimum time T1.  
  
  Result table. Retrieve the maximum time T4.  
  
  If T1 <= T4:  
  
    Result table. Delete data if the data >= T1.  
  
    Window table. Retrieve the record set R2, which contains data >= T1.  
  
    Input table. Delete the record set R3 (where ts <= T3) in chronological order.  
  
    Window table. Insert R3.  
  
    Input table. Delete up to N records from the record set R4 in chronological order.  
  
    Window table. Insert R4.  
  
    Compute R2 + R3 + R4 and write the result to the result table.  
  
  If T1 > T4:  
      
    Window table. Retrieve the record set R5 that contains data > = T4.  
  
    Input table. Delete up to N records from the record set R6 in chronological order.  
  
    Window table. Insert R6.  
  
    Compute R5 + R6 and write the result to the result table.

6. Schedule the SDT function.

Community

PostgreSQL Swinging Door Trending (SDT) Algorithm for Stream Computing Applications

Demo

Read previous post:

Read next post:

digoal

You may also like

Comments

digoal

Related Products

PolarDB for PostgreSQL

AnalyticDB for PostgreSQL

Media Solution

Apsara Stack