In terms of elective selection for college students, one elective can be selected by many students and one student may select multiple electives at the same time. Students and electives are in a many-to-many relationship. This situation brings about two problems:
Meeting the first requirement is very easy:
Consider 100 thousand students and 1000 electives. Each student selects 5 electives on average.
Let's see how to find which electives are relevant and sort these electives by relevancy (what other electives are also selected by students who have selected this elective). Sort the electives by the number of selections in descending order.
1. Create a table to store elective selection information:
create table xuanke(
std_id int8, -- Student ID
cos_id int -- Course
);
2. Insert 500 thousand records:
insert into xuanke select random()*99999, random()*999 from generate_series(1,500000);
3. Use arrays to store electives selected by individual students:
create table xuanke_cos_id (
std_id int8 primary key,
cos_id int[]
);
insert into xuanke_cos_id select std_id, array_agg(cos_id) from xuanke group by 1;
4. Use arrays to store information about students selecting each elective:
create table xuanke_std_id (
cos_id int primary key,
std_id int8[]
);
insert into xuanke_std_id select cos_id, array_agg(std_id) from xuanke group by 1;
5. Obtain all electives selected by students based on one elective they selected, aggregate these electives and display them by relevance:
create or replace function get_cos_id2(int8[]) returns text[] as
$$
select array_agg(unnest||':'||cnt order by cnt desc) from
(select unnest(cos_id) as unnest, count (*) as cnt
from xuanke_cos_id where std_id = any ($1) group by 1
) t;
$$
language sql strict;
6. Obtain results:
select cos_id, get_cos_id2(std_id) from xuanke_std_id;
Example results
251 | {251:495,348:9,708:8,372:7,816:7,431:6,184:6,600:6,114:6,649:6, .....
453 | {453:499,519:7,750:7,816:7,375:7,109:7,705:7,650:7,908:7, .....
The preceding query can be performed in milliseconds when parallel computing is used.
To implement the second requirement, a recursive method is required.
WITH RECURSIVE search_graph(
std_id, -- point 1
cos_id, -- point 2
depth, -- depth, starting from 1
path, -- path, stored as an array
cycle -- cycle or not cycle
) AS (
select std_id,cos_id,depth,path,cycle from (
select
std_id,
cos_id,
1 depth,
array[row(std_id,cos_id)] path,
false as cycle
from xuanke
where cos_id=?
) t
UNION ALL
select std_id,cos_id,depth,path,cycle from (
select
g.std_id,
g.cos_id,
sg.depth+1 depth,
sg.path||array[row(g.std_id,g.cos_id)] path,
(row(g.std_id,g.cos_id) = ANY(path)) as cycle
from xuanke as g, search_graph AS sg
where
g.std_id = sg.std_id
AND NOT cycle
-- and sg.depth <= ?
) t
)
SELECT * FROM search_graph;
This involves a very large volume of data and therefore it takes a very long time to return results. In fact, a very deep level does not make much difference.
PostgreSQL Practices - Application of Graph Search in Content Communities
PCC Social Media "Like" Scenario - Database Design and Performance Stress Testing
digoal - July 4, 2019
digoal - May 16, 2019
digoal - July 4, 2019
digoal - July 4, 2019
digoal - April 12, 2019
digoal - June 1, 2021
An on-demand database hosting service for PostgreSQL with automated monitoring, backup and disaster recovery capabilities
Learn MoreAlibaba Cloud PolarDB for PostgreSQL is an in-house relational database service 100% compatible with PostgreSQL and highly compatible with the Oracle syntax.
Learn MoreAn on-demand database hosting service for MySQL with automated monitoring, backup and disaster recovery capabilities
Learn MoreA financial-grade distributed relational database that features high stability, high scalability, and high performance.
Learn MoreMore Posts by digoal