In this article, we use a publicly available insurance dataset to demonstrate how PolarDB can perform graph queries to identify abnormal claim records and fraudulent groups in insurance claim scenarios. For example, you can query policies involving the claimant, or analyze the social relationships of the policyholder for potential fraudulent behavior. PolarDB enhances relational databases with graph analysis capabilities, providing support for centralized data management, and analysis within enterprises.
Graph analysis is a critical field of data science. It uses graph structures to represent data and perform various computing and analytic tasks. A graph structure consists of nodes (or vertices) and edges, with nodes representing entities and edges representing the relationships between the entities. Graph computing is commonly used in areas including social network analysis, recommendation systems, knowledge graphs, and path optimization.
PolarDB for PostgreSQL is highly compatible with Apache AGE, and supports storage and query of knowledge graphs. It allows you to use both the standard ANSI SQL and openCypher graph query language to query data within the same database cluster.
· Fully compatible with PolarDB for PostgreSQL
AGE is an extension for PolarDB for PostgreSQL. It can be used in existing PolarDB databases without the need for database reconstruction. AGE leverages all features provided by PolarDB, including transaction support, concurrency control, and various indexing and performance optimization features.
· Unified graph and relational queries
AGE enables the concurrent processing of relational and graph data. It allows you to use both SQL and the graph query language in a single query to simplify the handling of complex data models, which improves operation efficiency.
· Cypher query language supported
AGE supports the Cypher query language, which is tailored for graph databases. Its syntax is simple and flexible, providing an intuitive approach to querying and manipulating graph data.
· High performance
By combining the optimization features provided by PolarDB and indexes tailored for graph data, AGE can efficiently manage large-scale graph data and complex graph queries.
In conclusion, based on the capabilities provided by AGE, PolarDB can handle graph queries in a simple and efficient manner.
Insurance claim fraud typically involves data on patients, diseases, and claims held by insurance providers. By analyzing the connections between claim applications, diseases, and other related entities, you can identify abnormal claim records and expose potentially fraudulent groups.
The data is based on an insurance dataset which is publicly available. You can find the dataset here. It contains fundamental elements of the insurance industry. The data model can be abstracted into the following graph:
· Vertices: policyholder, incharge, claim, patient, disease.
· Edges: has_disease, policyholder_of_claim, incharge_of_claim, insured_of_claim, similar_claim, policyholder_connection.
· Properties: name, high_risk, risk_score, disease_name, similarity_score, level, claim_date, charge, and more.
The graph engine extension is supported only by PolarDB for PostgreSQL14 whose revision version is 14.12.24.0 or later.
Create the extension.
create extension age;
SET search_path = "$user", public, ag_catalog;
ALTER DATABASE <dbname>
SET search_path = "$user", public, ag_catalog;
ALTER DATABASE <dbname> SET session_preload_libraries TO 'age';
1. Create a graph by using the create_graph function in the ag_catalog namespace.
SELECT create_graph('graph');
2. Insert vertices and edges. The downloaded data file is in the CSV format and does not contain the necessary ID information. Convert the file before you import the data. A Python script is provided in Appendix to convert the data into vertices and edges that is supported by PolarDB. See the following conversion results:
● POLICYHOLDER
SELECT create_vlabel('graph','policyholder');
SELECT * FROM cypher('graph', $$ CREATE (:policyholder {policyholder_id:'PH3068',fname:'ADAM',lname:'OCHSENBEIN',risk_score:'88',high_risk:'1'}) $$ ) as (n agtype);
SELECT * FROM cypher('graph', $$ CREATE (:policyholder {policyholder_id:'PH3069',fname:'MALINDA',lname:'MEHSERLE',risk_score:'42',high_risk:'0'}) $$ ) as (n agtype);
SELECT * FROM cypher('graph', $$ CREATE (:policyholder {policyholder_id:'PH3070',fname:'SANDRA',lname:'KUHTA',risk_score:'20',high_risk:'0'}) $$ ) as (n agtype);
...
● CLAIM
- Create vlabel
SELECT create_vlabel('graph','claim');
SELECT * FROM cypher('graph', $$ CREATE (:claim {claim_id:'C3571',charge:'6517.53',claim_date:'2013-08-11 00:00:00',duration:'13',insured_id:'28523',diagnosis:'no exception',person_incharge_id:'PI23070',type:'services',policyholder_id:'PH9507'}) $$ ) as (n agtype);
SELECT * FROM cypher('graph', $$ CREATE (:claim {claim_id:'C3572',charge:'49273.65',claim_date:'2017-02-10 00:00:00',duration:'3',insured_id:'1220',diagnosis:'no exception',person_incharge_id:'PI21197',type:'services',policyholder_id:'PH406'}) $$ ) as (n agtype);
SELECT * FROM cypher('graph', $$ CREATE (:claim {claim_id:'C3573',charge:'52005.98',claim_date:'2014-06-29 00:00:00',duration:'27',insured_id:'23735',diagnosis:'no exception',person_incharge_id:'PI22361',type:'services',policyholder_id:'PH7911'}) $$ ) as (n agtype);
...
● Connections between policyholder and claim
SELECT * FROM cypher('graph', $$ MATCH (a:claim), (b:policyholder) WHERE a.claim_id = 'C1528' AND b.policyholder_id = 'PH2963' CREATE (a)-[e:RELTYPE ]->(b) RETURN e$$) as (e agtype);
SELECT * FROM cypher('graph', $$ MATCH (a:claim), (b:policyholder) WHERE a.claim_id = 'C1529' AND b.policyholder_id = 'PH1353' CREATE (a)-[e:RELTYPE ]->(b) RETURN e$$) as (e agtype);
SELECT * FROM cypher('graph', $$ MATCH (a:claim), (b:policyholder) WHERE a.claim_id = 'C1530' AND b.policyholder_id = 'PH1071' CREATE (a)-[e:RELTYPE ]->(b) RETURN e$$) as (e agtype);
SELECT * FROM cypher('graph', $$ MATCH (a:claim), (b:policyholder) WHERE a.claim_id = 'C1531' AND b.policyholder_id = 'PH8102' CREATE (a)-[e:RELTYPE ]->(b) RETURN e$$) as (e agtype);
SELECT * FROM cypher('graph', $$ MATCH (a:claim), (b:policyholder) WHERE a.claim_id = 'C1532' AND b.policyholder_id = 'PH4768' CREATE (a)-[e:RELTYPE ]->(b) RETURN e$$) as (e agtype);
...
Save the transformed results as an SQL file and use client tools such as the PostgreSQL client to import the data.
● Query the number of nodes of various types.
SELECT count(*) FROM cypher('graph', $$
MATCH (v)
RETURN v
$$) as (v agtype);
count
--------
120567
-- claim
SELECT count(*) FROM cypher('graph', $$
MATCH (v:claim)
RETURN v
$$) as (v agtype);
count
--------
100001
-- policyholder
SELECT count(*) FROM cypher('graph', $$
MATCH (v:policyholder)
RETURN v
$$) as (v agtype);
count
-------
10006
-- incharge
SELECT count(*) FROM cypher('graph', $$
MATCH (v:incharge)
RETURN v
$$) as (v agtype);
count
-------
10001
--disease
SELECT count(*) FROM cypher('graph', $$
MATCH (v:disease)
RETURN v
$$) as (v agtype);
count
-------
393
--patient
SELECT count(*) FROM cypher('graph', $$
MATCH (v:patient)
RETURN v
$$) as (v agtype);
count
-------
166
● Query the policyholder, incharge, and patient status of claim C4377.
SELECT 'policyholder_id' as type, policyholder_id FROM cypher('graph', $$
MATCH (:claim {claim_id: 'C4377'})-[]->(policyholder:policyholder)
RETURN policyholder.policyholder_id
$$) as (policyholder_id agtype)
UNION
SELECT 'incharge_id', incharge_id FROM cypher('graph', $$
MATCH (:claim {claim_id: 'C4377'})-[]->(v:incharge)
RETURN v.incharge_id
$$) as (incharge_id agtype)
UNION
SELECT 'patient_id', patient_id FROM cypher('graph', $$
MATCH (:claim {claim_id: 'C4377'})-[]->(v:patient)
RETURN v.patient_id
$$) as (patient_id agtype);
type | policyholder_id
-----------------+-----------------
patient_id | "11279"
policyholder_id | "PH3759"
incharge_id | "PI26607"
● Query the claims with the same insured patient as fraudulent policy C4377 to determine whether the claimant is involved in insurance fraud.
SELECT 'claim_id', claim_id FROM cypher('graph', $$
MATCH (:claim {claim_id: 'C4377'})-[]->(p:patient)<-[]-(c:claim)
RETURN c.claim_id
$$) as (claim_id agtype);
?column? | claim_id
----------+----------
claim_id | "C28963"
claim_id | "C3679"
claim_id | "C96545"
claim_id | "C26586"
claim_id | "C26754"
claim_id | "C87278"
claim_id | "C87603"
claim_id | "C69395"
claim_id | "C67594"
claim_id | "C96155"
claim_id | "C10160"
● Query the social relationships of the policyholder of fraudulent policy C4377 to provide early warnings about their claims.
SELECT 'policyholder_id', policyholder_id FROM cypher('graph', $$
MATCH (:claim {claim_id: 'C4377'})-[]->(a:policyholder)-[r*1..3]->(p:policyholder)
RETURN p.policyholder_id
$$) as (policyholder_id agtype);
?column? | policyholder_id
-----------------+-----------------
policyholder_id | "PH52532"
policyholder_id | "PH11283"
policyholder_id | "PH11328"
policyholder_id | "PH1"
policyholder_id | "PH5"
policyholder_id | "PH512"
policyholder_id | "PH1569"
policyholder_id | "PH4722"
policyholder_id | "PH4731"
● Query the path between policyholder PH3759 and policyholder PH4722 to check their connections.
SELECT *
FROM cypher('graph', $$
MATCH path = (:policyholder {policyholder_id: 'PH3759'})-[r*1..3]->(:policyholder {policyholder_id: 'PH4722'})
RETURN path
$$) AS (v agtype);
-------
[{"id": 844424930136988, "label": "policyholder", "properties": {"fname": "KURTIS", "lname": "ALKEMA", "high_risk": "1", "risk_score": "78", "policyholder_id": "PH3759"}}::vertex, {"id": 2251799813685487, "label": "RELTYPE", "end_id": 844424930133473, "start_id": 844424930136988, "properties": {"level": "65"}}::edge, {"id": 844424930133473, "label": "policyholder", "properties": {"fname": "TERRA", "lname": "SWARB", "high_risk": "0", "risk_score": "25", "policyholder_id": "PH512"}}::vertex, {"id": 2251799813685546, "label": "RELTYPE", "end_id": 844424930138502, "start_id": 844424930133473, "properties": {"level": "62"}}::edge, {"id": 844424930138502, "label": "policyholder", "properties": {"fname": "VETA", "lname": "SEDLACK", "high_risk": "0", "risk_score": "31", "policyholder_id": "PH1569"}}::vertex, {"id": 2251799813685594, "label": "RELTYPE", "end_id": 844424930136281, "start_id": 844424930138502, "properties": {"level": "92"}}::edge, {"id": 844424930136281, "label": "policyholder", "properties": {"fname": "DEANNA", "lname": "BALSER", "high_risk": "0", "risk_score": "36", "policyholder_id": "PH4722"}}::vertex]::path
● Query common neighbors of policies C4377 and C67594 to identify shared policyholders.
SELECT 'policyholder_id', policyholder_id FROM cypher('graph', $$
MATCH (:claim {claim_id: 'C4377'})-[]->(p:policyholder)<-[]-(:claim {claim_id: 'C67594'})
RETURN p.policyholder_id
$$) as (policyholder_id agtype);
?column? | policyholder_id
-----------------+-----------------
policyholder_id | "PH3759"
● Query policies that share the same policyholder as fraudulent policy C4377 to identify other potentially fraudulent policies.
SELECT 'claim_id', claim_id FROM cypher('graph', $$
MATCH (:claim {claim_id: 'C4377'})-[]->(p:policyholder)<-[]-(c:claim)
RETURN c.claim_id
$$) as (claim_id agtype);
?column? | claim_id
----------+----------
claim_id | "C28963"
claim_id | "C96545"
claim_id | "C3679"
claim_id | "C87603"
claim_id | "C26754"
claim_id | "C26586"
claim_id | "C87278"
claim_id | "C69395"
claim_id | "C67594"
claim_id | "C96155"
claim_id | "C10160"
● Query the top 20 policies most similar to fraudulent policy C4377:
with t as (
SELECT claim_id, replace(trim(both '"' from to_jsonb(properties)::text), '\"', '"') as similarity FROM cypher('graph', $$
MATCH (:claim {claim_id: 'C4377'})-[e]->(c:claim)
RETURN properties(e), c.claim_id
$$) as (properties agtype, claim_id agtype)
)
SELECT claim_id, replace((similarity::jsonb->'similarity_score')::text, '"','')::integer as s
from t
ORDER BY s DESC
LIMIT 20;
claim_id | s
----------+----
"C67594" | 13
"C69395" | 13
"C10160" | 13
"C87603" | 13
"C28963" | 13
"C3679" | 13
"C26754" | 13
"C96155" | 13
"C26586" | 13
"C87278" | 13
"C96545" | 13
"C20113" | 8
"C70759" | 8
"C28785" | 8
"C12793" | 8
"C59736" | 8
"C38059" | 8
"C34068" | 8
"C71827" | 8
"C15760" | 8
You can combine the graph analysis capabilities provided by PolarDB for PostgreSQL, and the AGE extension to perform graph data analysis. PolarDB provides graph data computing and analysis features, including Cypher query language support and efficient graph data query processing, to facilitate centralized data management and analysis within enterprises.
Visit the Free Trials & Special Offers page, select the PolarDB option, and try the graph computing features of GanosBase provided by PolarDB for PostgreSQL.
Data conversion script for converting CSV data.
import csv
import os
def convert_vertex_csv(file_path, graph):
file_name = os.path.splitext(os.path.basename(file_path))[0].lower()
# create vlabel
print("------------------------------------------------")
print("-- Create vlabel")
print("SELECT create_vlabel('{}','{}');".format(graph, file_name))
with open(file_path, 'r') as csvfile:
reader = csv.reader(csvfile, delimiter=',')
header = next(reader)
for row in reader:
p = ""
for h in header:
if p != "":
p += ","
else:
p += "{"
p += "{}:'{}'".format(h.lower(), row[header.index(h)].strip())
if p != "":
p += "}"
print("SELECT * FROM cypher('{}', $$ CREATE (:{} {}) $$ ) as (n agtype);".format(graph, file_name, p))
def convert_edge_csv(file_path, graph, from_type, to_type):
file_name = os.path.splitext(os.path.basename(file_path))[0].lower()
with open(file_path, 'r') as csvfile:
reader = csv.reader(csvfile, delimiter=',')
header = next(reader)
for row in reader:
p = ""
for h in header:
if (h.endswith("ID")):
continue;
if p != "":
p += ","
else:
p += "{"
p += "{}:'{}'".format(h.lower(), row[header.index(h)].strip())
if p != "":
p += "}"
print("SELECT * FROM cypher('{0}', $$ MATCH (a:{1}), (b:{2}) WHERE a.{1}_id = '{3}' AND "
"b.{2}_id = '{4}' CREATE (a)-[e:RELTYPE {5} ]->(b) RETURN e$$) as (e agtype);".format(graph, from_type, to_type, row[0].strip(), row[1].strip(), p))
def generate_graph_csv(directory, graph):
print("------------------------------------------------")
print("-- Create graph")
print("SELECT create_graph('{}');".format(graph))
print("------------------------------------------------")
print("-- Create vertex")
convert_vertex_csv(directory + "/POLICYHOLDER.csv", graph)
convert_vertex_csv(directory + "/INCHARGE.csv", graph)
convert_vertex_csv(directory + "/PATIENT.csv", graph)
convert_vertex_csv(directory + "/CLAIM.csv", graph)
convert_vertex_csv(directory + "/DISEASE.csv", graph)
print("------------------------------------------------")
print("-- Create edge")
convert_edge_csv(directory + "/POLICYHOLDER_CONNECTION.csv", graph, 'policyholder','policyholder')
convert_edge_csv(directory + "/INCHARGE_OF_CLAIM.csv", graph,'claim', 'incharge')
convert_edge_csv(directory + "/CLAIM_SIMILARITY.csv", graph, 'claim','claim')
convert_edge_csv(directory + "/POLICYHOLDER_OF_CLAIM.csv", graph,'claim', 'policyholder')
convert_edge_csv(directory + "/INSURED_OF_CLAIM.csv", graph, 'claim','patient')
convert_edge_csv(directory + "/HAS_DISEASE.csv", graph, 'patient','disease')
generate_graph_csv("analyzing-insurance-claims-using-ibm-db2-graph-master/data", "graph")
Best Practices for Importing Vector and Raster Data to GanosBase
Graph Analysis Based on PolarDB: Fast Import of Graph Data via Tables
Alipay Technology - December 26, 2019
Alibaba Clouder - October 28, 2019
ApsaraDB - March 3, 2020
Alibaba Clouder - November 29, 2017
ApsaraDB - April 19, 2019
Alibaba Clouder - July 5, 2019
Follow our step-by-step best practices guides to build your own business case.
Learn MoreAlibaba Cloud PolarDB for PostgreSQL is an in-house relational database service 100% compatible with PostgreSQL and highly compatible with the Oracle syntax.
Learn MoreAlibaba Cloud PolarDB for Xscale (PolarDB-X) is a cloud-native high-performance distributed database service independently developed by Alibaba Cloud.
Learn MoreAlibaba Cloud PolarDB for MySQL is a cloud-native relational database service 100% compatible with MySQL.
Learn MoreMore Posts by ApsaraDB