Sometimes a business may want to know what will happen following a specific event. For example, will publishing house price regulations lead to large investment in the stock market?
Let's take a look at how PostgreSQL databases can meet this type of requirement.
1. Construct the structure of a table holding events
create table event(
id serial8 primary key, -- Auto-increment sequence that determines the sequential order of events
class text, -- Event type
info text, -- Event description
crt_time timestamp -- Time when an event occurs
);
2. Insert 10 million event records, which are of 5,000 different event types
insert into event (class, info , crt_time) select (5000*random())::int::text, 'test', clock_timestamp() from generate_series(1,10000000);
3. Create indexes by event type
create index idx_event_class on event (class);
Query a specific number of events following a certain event
Simply use the following function to meet the preceding requirement
create or replace function f(
sql text, -- Query the IDs of the target events to be analyzed
v_class text, -- Type of a target event
v_n int8, -- How many subsequent events are to be checked
v_limit int -- How many times events are analyzed. If all output is required, enter a large value
) returns setof event as
$$
declare
v_id int8;
begin
for v_id in execute sql loop
return query select * from event where id>=v_id order by id limit v_n+1;
v_limit := v_limit - 1;
if v_limit<=0 then
return;
end if;
end loop;
end;
$$
language plpgsql strict;
Query example
Query two events following an event of type 1 and generate analysis results ten times.
postgres=# select * from f(
$$
select id from event where class='1'
$$
, '1', 2, 10);
id | class | info | crt_time
-------+-------+------+----------------------------
1592 | 1 | test | 2017-03-31 15:07:23.77348
1593 | 3032 | test | 2017-03-31 15:07:23.773483
1594 | 3409 | test | 2017-03-31 15:07:23.773486
2784 | 1 | test | 2017-03-31 15:07:23.777265
2785 | 2819 | test | 2017-03-31 15:07:23.777268
2786 | 87 | test | 2017-03-31 15:07:23.777271
12176 | 1 | test | 2017-03-31 15:07:23.807489
12177 | 2586 | test | 2017-03-31 15:07:23.807491
12178 | 4101 | test | 2017-03-31 15:07:23.807494
19398 | 1 | test | 2017-03-31 15:07:23.83072
19399 | 1179 | test | 2017-03-31 15:07:23.830723
19400 | 4237 | test | 2017-03-31 15:07:23.830726
19571 | 1 | test | 2017-03-31 15:07:23.831296
19572 | 4368 | test | 2017-03-31 15:07:23.831299
19573 | 2313 | test | 2017-03-31 15:07:23.831302
24708 | 1 | test | 2017-03-31 15:07:23.847794
24709 | 1327 | test | 2017-03-31 15:07:23.847797
24710 | 4584 | test | 2017-03-31 15:07:23.847799
29756 | 1 | test | 2017-03-31 15:07:23.864234
29757 | 4386 | test | 2017-03-31 15:07:23.864237
29758 | 3044 | test | 2017-03-31 15:07:23.864239
30224 | 1 | test | 2017-03-31 15:07:23.865765
30225 | 4704 | test | 2017-03-31 15:07:23.865768
30226 | 332 | test | 2017-03-31 15:07:23.865771
32004 | 1 | test | 2017-03-31 15:07:23.871554
32005 | 219 | test | 2017-03-31 15:07:23.871557
32006 | 3548 | test | 2017-03-31 15:07:23.871559
36472 | 1 | test | 2017-03-31 15:07:23.886097
36473 | 640 | test | 2017-03-31 15:07:23.8861
36474 | 1139 | test | 2017-03-31 15:07:23.886103
(30 rows)
With the preceding results, we can know what events will happen after the occurrence of event 1.
To use the preceding results for further analysis, use hotspot analysis similar to keywords, use the MADlib machine learning database in PostgreSQL or use plR.
You always have a method to find the correlation between events.
Because the previous analysis targets independent events, which bear no correlation themselves, it is required to use functions to generate results.
Users' mouse clicks on Taobao (for example, what other items a user clicks after clicking a specific item) is an example of non-independent events.
For non-independent events, the structure design includes associated fields like USERID, so that we can easily establish association.
Trajectory analysis can be performed on non-independent events by using recursive calls in PostgreSQL.
Recursive syntax in PostgreSQL and PL/pgSQL can perfectly meet the requirements of analyzing events before and after a specific event in public opinion analysis scenarios.
Then hotspot analysis, cluster analysis, MADlib in PostgreSQL or plR can be used to analyze events to find event correlation.
PostgreSQL Independent Event Correlation Analysis - User-Car Fitting
Alibaba Clouder - June 12, 2018
Alibaba Cloud Storage - February 27, 2020
Farruh - August 11, 2023
JwdShah - February 7, 2024
Alibaba Clouder - July 22, 2020
Alibaba Cloud Project Hub - March 19, 2024
Alibaba Cloud PolarDB for PostgreSQL is an in-house relational database service 100% compatible with PostgreSQL and highly compatible with the Oracle syntax.
Learn MoreAn online MPP warehousing service based on the Greenplum Database open source program
Learn MoreHighly reliable and secure deployment solutions for enterprises to fully experience the unique benefits of the hybrid cloud
Learn MoreThis solution helps Internet Data Center (IDC) operators and telecommunication operators build a local public cloud from scratch.
Learn MoreMore Posts by digoal