Hologres path analysis functions include path details functions and path parsing functions. You can use path analysis functions to compute path details and parse the results in an efficient manner. You can also display the returned data in a Sankey diagram to view complex path information in a visualized manner. This topic describes how to use path details functions and path parsing functions in Hologres.
Background information
Path analysis functions record the distribution of paths when you use a product or a feature and the access sequence of each session, and generate an intuitive and easy-to-understand Sankey diagram of behavior paths. A Sankey diagram displays each key node in the access process and the inbound and outbound traffic between nodes. This facilitates fine-grained business operations. Path analysis functions help you understand the access to each key feature of a product, and further assist operations personnel and product personnel in subsequent business strategy optimization and product iteration. Hologres supports path details functions for path details computing and path parsing functions for path result parsing. The returned results of the functions can be directly displayed in Sankey diagrams.
Usage notes
Hologres V2.2 and later support path analysis functions. If the version of your Hologres instance is V2.1 or earlier, you can contact the Hologres technical support to upgrade your instance.
Before you use path analysis functions, you must create an extension. By default, the extension is installed in the public schema. The extension cannot be installed in other schemas.
-- Create an extension. CREATE extension flow_analysis;
NoteAn extension is created at the database level. For each database, you need to create an extension only once. Do not create duplicate extensions in a database.
Path details functions
path_analysis_detail
The path_analysis_detail
function deeply parses specified event data and returns detailed path structures in the form of a serialized array. The returned result includes key information, such as the complete path series, the parent-child relationships between events in each path, and the execution duration of each event in each path.
Syntax
path_analysis_detail( event, event_time, start_event, session_interval_sec, path_depth, path_offset, is_reverse, split_session_by_event)
Parameters
Parameter
Data type
Description
event
text (data)
The event that you want to analyze.
event_time
timestamp, timestamptz, and bigint (data)
The time series of events.
start_event
text (literal)
The event that you specify as the start event or end event.
session_interval_sec
bigint (literal)
The session interval. Unit: seconds.
If the interval between adjacent events in a session exceeds the value of this parameter, the system splits the session into two sessions. For example, if you set this parameter to 30 seconds and the interval between adjacent events exceeds 30 seconds, the system splits the session into two sessions.
path_depth
bigint (literal)
The length of an event series from the start event or end event.
path_offset
bigint (literal)
The event offset from the start event or end event. 0 specifies no offset.
For example, if you configure Event a as the start event and set the offset to 1 for the event series a, b, and c, the system performs event matching from Event b.
is_reverse
bool (literal)
Specifies whether to return event series in reverse chronological order. Valid values:
false: returns event series in chronological order from the specified start event.
true: returns event series in reverse chronological order from the specified end event.
split_session_by_event
bool (literal)
Specifies whether to split a session based on an event. Valid values:
true: splits a session based on a specified start or end event in an ordered event series.
false: splits a session based only on the value of the
session_interval_sec
parameter. This is the default value.NoteIf you set the
session_interval_sec
parameter to -1, the system splits a session based on a specified start or end event.
For example, the path (event series a, b, and c) is repeated 10 times in a session. The
pad_funnel
function records only one conversion for this session. If you want to record each repeated path as a new session and count one conversion for each session, you can set this parameter to true to split the session. This way, each repeated path can be recorded as one session and one conversion is counted for each session.Return value
path_analysis_detail: A value of the
TEXT
type is returned. Example:{"",\x01a\x01b<,\x01b\x01c<,""}
.NoteThe return value is a serialized array and cannot be directly read. You can use a path parsing function to read the return value.
pad_funnel
The pad_funnel
function is used to obtain the subpath of a combination of specific events.
Syntax
pad_funnel(path_analysis_detail(), target_path)
Parameters
Parameter
Data type
Description
path_analysis_detail()
text (data)
The array of aggregated results of the path details returned by the path_analysis_detail function.
target_path
text (literal)
The event series.
Return value
path_analysis_detail: A value of the
TEXT
type is returned. Example:text []{"",\x01a\x01b<,\x01b\x01c<,""}
.NoteThe return value is a serialized array and cannot be directly read. You can use a path parsing function to read the return value.
Path parsing functions
pad_full_path
The pad_full_path
function can perform comprehensive path parsing on the return value of the path_analysis_detail
function. The pad_full_path function can also parse the full path, subpath, and time consumption of each event. This helps business personnel observe paths more intuitively.
Syntax
pad_full_path(path_analysis_detail())
Parameters
path_analysis_detail(): the array of aggregated results of the path details returned by the path_analysis_detail function.
Return value
The following code shows the result returned by the
pad_full_path
function:pad_sub_path_left(unnested_pad_result) pad_sub_path_right(unnested_pad_result) pad_sub_index_left(unnested_pad_result) pad_sub_index_right(unnested_pad_result) pad_sub_cost(unnested_pad_result) pad_sub_session(unnested_pad_result)
The following table describes the parameters in the preceding code.
Parameter
Data type
Description
pad_sub_path_left
text
The start event of the subpath.
pad_sub_path_right
text
The end event of the subpath.
pad_sub_index_left
bigint
The sequence number of the start event of the subpath in the full path of the session to which the subpath belongs.
pad_sub_index_right
bigint
The sequence number of the end event of the subpath in the full path of the session to which the subpath belongs.
pad_sub_cost
bigint
The redirection duration of the subpath. Unit: seconds.
pad_sub_session
bigint
The sequence number of the valid session to which the subpath belongs.
pad_session_path_array
The pad_session_path_array
function can accurately extract the event series in a session based on the specified session ID and organize the event series in order by path prefix.
Syntax
pad_session_path_array(path_analysis_detail(), session_idx)
Parameters
path_analysis_detail(): the array of aggregated results of the path details returned by the path_analysis_detail function.
session_idx: the session ID.
Return value
A value of the ARRAY type is returned.
Examples
Prepare data
-- Create an extension. An extension is created at the database level. For each database, you need to create an extension only once.
CREATE extension flow_analysis;
-- Prepare data.
CREATE TABLE path_demo(
uid text,
event text,
event_time timestamptz
);
INSERT INTO path_demo VALUES
('1','Register','2023-11-24 16:01:23+08'),
('1','Log on','2023-11-24 16:02:10+08'),
('1','Browse','2023-11-24 16:02:15+08'),
('1','View live streams','2023-11-24 16:03:10+08'),
('1','Browse','2023-11-24 16:03:15+08'),
('1','Add to favorites','2023-11-24 16:04:20+08'),
('1','Browse','2023-11-24 16:07:21+08'),
('1','Purchase','2023-11-24 16:08:23+08'),
('1','Exit','2023-11-24 16:09:05+08'),
('2','Log on','2023-11-24 16:10:23+08'),
('2','Purchase','2023-11-24 16:12:23+08'),
('3','Log on','2023-11-24 16:02:23+08'),
('3','Browse','2023-11-24 16:02:23+08'),
('3','Add to favorites','2023-11-24 16:03:53+08'),
('3','View live streams','2023-11-24 16:04:53+08'),
('4','Log on','2023-11-24 16:02:23+08'),
('4','Browse','2023-11-24 16:03:53+08'),
('4','Purchase','2023-11-24 16:04:23+08'),
('4','View live streams','2023-11-24 16:05:53+08'),
('4','Cancel the order','2023-11-24 16:06:53+08');
Example 1: Obtain all event paths
Split a session based on the time: Specify the start event, set the session_interval_sec parameter to 180 seconds, and then set the path_depth parameter to 7.
-- Split a session based on the time: Specify "Log on" as the start event, set the session_interval_sec parameter to 180 seconds, set the path_depth parameter to 7, and then use the pad_full_path function to parse the result. SELECT uid, pad_full_path(path_analysis_detail(event, event_time, 'Log on', 180, 7, 0, false)) AS ret FROM path_demo GROUP BY uid;
The following code shows the returned result:
uid | ret -----+--------------------------------------------------- 3 | {Log on -> Add to favorites -> View live streams} 4 | {Log on -> Browse-> Purchase -> View live streams -> Cancel the order} 1 | {Log on -> Browse-> View live streams -> Browse -> Add to favorites} 2 | {Log on -> Purchase} (4 rows)
Split a session based on the time and an event: Specify the start event, set the session_interval_sec parameter to 180 seconds, and then set the path_depth parameter to 7.
-- Split a session based on the time and an event: Specify "Browse" as the start event, set the session_interval_sec parameter to 180 seconds, set the path_depth parameter to 7, and then use the pad_full_path function to parse the result. SELECT uid, pad_full_path(path_analysis_detail(event, event_time, 'Browse', 180, 7, 0, false,TRUE)) AS ret FROM path_demo GROUP BY uid;
The following code shows the returned result:
uid | ret -----+----------------------------------------------------------- 1 | {Browse -> View live streams, Browse -> Add to favorites, Browse -> Purchase -> Exit} 2 | {} 4 | {Browse -> Purchase -> View live streams -> Cancel the order} 3 | {Browse -> Log on -> Add to favorites -> View live streams}
Example 2: Expand the path result
-- Expand paths.
SELECT uid, unnest(pad_full_path(path_analysis_detail(event, event_time, 'Log on', 180, 7, 0, false))) AS ret FROM path_demo GROUP BY uid;
For more information about the UNNEST clause, see UNNEST clause.
The following code shows the returned result:
uid | ret
-----+-------------------------------------------------
3 | Log on -> Add to favorites -> View live streams
1 | Log on -> Browse -> View live streams -> Browse -> Add to favorites
2 | Log on -> Purchase
4 | Log on -> Browse -> Purchase -> View live streams -> Cancel the order
(4 rows)
Example 3: Expand subpaths and obtain the path details of each event
-- Expand subpaths.
SELECT
uid,
pad_sub_session (ret) AS session_id,
pad_sub_path_left (ret) AS sub_path_left,
pad_sub_path_right (ret) AS sub_path_right,
pad_sub_index_left (ret) AS sub_index_left,
pad_sub_index_right (ret) AS sub_index_right,
pad_sub_cost (ret) AS sub_cost
FROM (
SELECT
uid,
unnest( path_analysis_detail (event, event_time, 'Log on', 180, 7, 0, FALSE)) AS ret
FROM
path_demo
GROUP BY
uid) a ;
The following code shows the returned result:
uid | session_id | sub_path_left | sub_path_right | sub_index_left | sub_index_right | sub_cost
-----+------------+---------------+----------------+----------------+-----------------+----------
1 | 0 | | Log on | -1 | 0 | 0
1 | 0 | Log on | Browse | 0 | 1 | 5
1 | 0 | Browse | View live streams | 1 | 2 | 55
1 | 0 | View live streams | Browse | 2 | 3 | 5
1 | 0 | Browse | Add to favorites | 3 | 4 | 65
2 | 0 | | Log on | -1 | 0 | 0
2 | 0 | Log on | Purchase | 0 | 1 | 120
3 | 0 | | Log on | -1 | 0 | 0
3 | 0 | Log on | Add to favorites | 0 | 1 | 90
3 | 0 | Add to favorites | View live streams | 1 | 2 | 60
4 | 0 | | Log on | -1 | 0 | 0
4 | 0 | Log on | Browse | 0 | 1 | 90
4 | 0 | Browse | Purchase | 1 | 2 | 30
4 | 0 | Purchase | View live streams | 2 | 3 | 90
4 | 0 | View live streams | Cancel the order | 3 | 4 | 60
(15 rows)
Example 4: Obtain the event series in a specific session
SELECT
uid,
pad_session_path_array (path_analysis_detail (event, event_time, 'Log on', 180, 7, 0,FALSE), 0) AS ret
FROM
path_demo
GROUP BY
uid;
The following code shows the returned result:
uid | ret
-----+-----------------------------------------------
1 | {Log on,Browse,View live streams,Browse,Add to favorites}
2 | {Log on,Purchase}
3 | {Log on,Add to favorites,View live streams}
4 | {Log on,Browse,Purchase,View live streams,Cancel the order}
(4 rows)
Example 5: Calculate the PVs and UVs in each subpath without deduplication
-- Calculate the page views (PVs) and unique visitors (UVs) in each subpath without deduplication. If deduplication is required, you can deduplicate data based on the value of uid.
SELECT
sub_index,
sub_path_left,
sub_path_right,
count(uid)
FROM (
SELECT
uid,
pad_sub_path_left (ret) AS sub_path_left,
pad_sub_path_right (ret) AS sub_path_right,
pad_sub_index_right (ret) AS sub_index
FROM (
SELECT
uid,
unnest(path_analysis_detail (event, event_time, 'Log on', 180, 7, 0, FALSE)) AS ret
FROM
path_demo
GROUP BY
uid) a) a
GROUP BY
sub_index,
sub_path_left,
sub_path_right
ORDER BY
sub_index,
sub_path_left,
sub_path_right;
The following code shows the returned result:
sub_index | sub_path_left | sub_path_right | count
-----------+---------------+----------------+-------
0 | | Log on | 4
1 | Log on | Add to favorites | 1
1 | Log on | Browse | 2
1 | Log on | Purchase | 1
2 | Add to favorites | View live streams | 1
2 | Browse | View live streams | 1
2 | Browse | Purchase | 1
3 | View live streams | Browse | 1
3 | Purchase | View live streams | 1
4 | Browse | Add to favorites | 1
4 | View live streams | Cancel the order | 1
(11 rows)
Example 6: Calculate the average duration for each subpath
-- Calculate the average duration for each subpath.
SELECT
sub_path_left,
sub_path_right,
avg(sub_cost)
FROM (
SELECT
uid,
pad_sub_path_left (ret) AS sub_path_left,
pad_sub_path_right (ret) AS sub_path_right,
pad_sub_cost (ret) AS sub_cost
FROM (
SELECT
uid,
unnest(path_analysis_detail (event, event_time, 'Log on', 180, 7, 0, FALSE)) AS ret
FROM
path_demo
GROUP BY
uid) a) a
GROUP BY
sub_path_left,
sub_path_right
ORDER BY
sub_path_left,
sub_path_right;
The following code shows the returned result:
sub_path_left | sub_path_right | avg
---------------+----------------+------------
Add to favorites | View live streams | 60.000000
Browse | Add to favorites | 65.000000
Browse | View live streams | 55.000000
Browse | Purchase | 30.000000
Log on | Add to favorites | 90.000000
Log on | Browse | 47.500000
Log on | Purchase | 120.000000
View live streams | Cancel the order | 60.000000
View live streams | Browse | 5.000000
Purchase | View live streams | 90.000000
| Log on | 0.000000
(11 rows)
Example 7: Query the association between the full path and subpaths in a session
-- Query the association between the full path and subpaths in a session.
select
uid,
pad_sub_session(item) as session_id,
full_path [pad_sub_session(item)+1] as full_path,
pad_sub_path_left(item) as sub_path_left,
pad_sub_path_right(item) as sub_path_right,
pad_sub_index_right(item) as sub_idx,
pad_sub_cost(item) as sub_cost
from
(
select
uid,
unnest(ret) as item,
pad_full_path(ret) as full_path
from
(
select
uid,
path_analysis_detail(event, event_time, 'Log on', 180, 7, 0, false) as ret
from
path_demo
group by
uid
) a
) a;
The following code shows the returned result:
uid | session_id | full_path | sub_path_left | sub_path_right | sub_idx | sub_cost
-----+------------+-------------------------------------------------+---------------+----------------+---------+----------
3 | 0 | Log on -> Add to favorites -> View live streams | | Log on | 0 | 0
3 | 0 | Log on -> Add to favorites -> View live streams | Log on | Add to favorites | 1 | 90
3 | 0 | Log on -> Add to favorites -> View live streams | Add to favorites | View live streams | 2 | 60
1 | 0 | Log on -> Browse -> View live streams -> Browse -> Add to favorites | | Log on | 0 | 0
1 | 0 | Log on -> Browse -> View live streams -> Browse -> Add to favorites | Log on | Browse | 1 | 5
1 | 0 | Log on -> Browse -> View live streams -> Browse -> Add to favorites | Browse | View live streams | 2 | 55
1 | 0 | Log on -> Browse -> View live streams -> Browse -> Add to favorites | View live streams | Browse | 3 | 5
1 | 0 | Log on -> Browse -> View live streams -> Browse -> Add to favorites | Browse | Add to favorites | 4 | 65
2 | 0 | Log on -> Purchase | | Log on | 0 | 0
2 | 0 | Log on -> Purchase | Log on | Purchase | 1 | 120
4 | 0 | Log on -> Browse -> Purchase -> View live streams -> Cancel the order | | Log on | 0 | 0
4 | 0 | Log on -> Browse -> Purchase -> View live streams -> Cancel the order | Log on | Browse | 1 | 90
4 | 0 | Log on -> Browse -> Purchase -> View live streams -> Cancel the order | Browse | Purchase | 2 | 30
4 | 0 | Log on -> Browse -> Purchase -> View live streams -> Cancel the order | Purchase | View live streams | 3 | 90
4 | 0 | Log on -> Browse -> Purchase -> View live streams -> Cancel the order | View live streams | Cancel the order | 4 | 60
(15 rows)
Example 8: View the details of a specified path
-- You can use the pad_tunnel function to view the details of a specified path. In this example, details or subpath information of the Browse > Purchase path is queried.
SELECT uid, pad_full_path(pad_funnel(path_analysis_detail(event, event_time, 'Log on', 180, 7, 0, false), array['Log on', 'Purchase'])) AS ret FROM path_demo GROUP BY uid;
The following code shows the returned result:
uid | ret
-----+------------------
3 | {Log on}
4 | {Log on -> Purchase}
1 | {Log on}
2 | {Log on -> Purchase}
(4 rows)