All Products
Search
Document Center

Hologres:Path analysis functions

Last Updated:May 23, 2024

Hologres path analysis functions include path details functions and path parsing functions. You can use path analysis functions to compute path details and parse the results in an efficient manner. You can also display the returned data in a Sankey diagram to view complex path information in a visualized manner. This topic describes how to use path details functions and path parsing functions in Hologres.

Background information

Path analysis functions record the distribution of paths when you use a product or a feature and the access sequence of each session, and generate an intuitive and easy-to-understand Sankey diagram of behavior paths. A Sankey diagram displays each key node in the access process and the inbound and outbound traffic between nodes. This facilitates fine-grained business operations. Path analysis functions help you understand the access to each key feature of a product, and further assist operations personnel and product personnel in subsequent business strategy optimization and product iteration. Hologres supports path details functions for path details computing and path parsing functions for path result parsing. The returned results of the functions can be directly displayed in Sankey diagrams.

Usage notes

  • Hologres V2.2 and later support path analysis functions. If the version of your Hologres instance is V2.1 or earlier, you can contact the Hologres technical support to upgrade your instance.

  • Before you use path analysis functions, you must create an extension. By default, the extension is installed in the public schema. The extension cannot be installed in other schemas.

    -- Create an extension.
    CREATE extension flow_analysis;
    Note

    An extension is created at the database level. For each database, you need to create an extension only once. Do not create duplicate extensions in a database.

Path details functions

path_analysis_detail

The path_analysis_detail function deeply parses specified event data and returns detailed path structures in the form of a serialized array. The returned result includes key information, such as the complete path series, the parent-child relationships between events in each path, and the execution duration of each event in each path.

  • Syntax

    path_analysis_detail(
      event, 
    	event_time, 
    	start_event, 
    	session_interval_sec, 
    	path_depth, 
    	path_offset, 
    	is_reverse,
    split_session_by_event)
  • Parameters

    Parameter

    Data type

    Description

    event

    text (data)

    The event that you want to analyze.

    event_time

    timestamp, timestamptz, and bigint (data)

    The time series of events.

    start_event

    text (literal)

    The event that you specify as the start event or end event.

    session_interval_sec

    bigint (literal)

    The session interval. Unit: seconds.

    If the interval between adjacent events in a session exceeds the value of this parameter, the system splits the session into two sessions. For example, if you set this parameter to 30 seconds and the interval between adjacent events exceeds 30 seconds, the system splits the session into two sessions.

    path_depth

    bigint (literal)

    The length of an event series from the start event or end event.

    path_offset

    bigint (literal)

    The event offset from the start event or end event. 0 specifies no offset.

    For example, if you configure Event a as the start event and set the offset to 1 for the event series a, b, and c, the system performs event matching from Event b.

    is_reverse

    bool (literal)

    Specifies whether to return event series in reverse chronological order. Valid values:

    • false: returns event series in chronological order from the specified start event.

    • true: returns event series in reverse chronological order from the specified end event.

    split_session_by_event

    bool (literal)

    Specifies whether to split a session based on an event. Valid values:

    • true: splits a session based on a specified start or end event in an ordered event series.

    • false: splits a session based only on the value of the session_interval_sec parameter. This is the default value.

      Note

      If you set the session_interval_sec parameter to -1, the system splits a session based on a specified start or end event.

    For example, the path (event series a, b, and c) is repeated 10 times in a session. The pad_funnel function records only one conversion for this session. If you want to record each repeated path as a new session and count one conversion for each session, you can set this parameter to true to split the session. This way, each repeated path can be recorded as one session and one conversion is counted for each session.

  • Return value

    path_analysis_detail: A value of the TEXT type is returned. Example: {"",\x01a\x01b<,\x01b\x01c<,""}.

    Note

    The return value is a serialized array and cannot be directly read. You can use a path parsing function to read the return value.

pad_funnel

The pad_funnel function is used to obtain the subpath of a combination of specific events.

  • Syntax

    pad_funnel(path_analysis_detail(), target_path)
  • Parameters

    Parameter

    Data type

    Description

    path_analysis_detail()

    text (data)

    The array of aggregated results of the path details returned by the path_analysis_detail function.

    target_path

    text (literal)

    The event series.

  • Return value

    path_analysis_detail: A value of the TEXT type is returned. Example: text []{"",\x01a\x01b<,\x01b\x01c<,""}.

    Note

    The return value is a serialized array and cannot be directly read. You can use a path parsing function to read the return value.

Path parsing functions

pad_full_path

The pad_full_path function can perform comprehensive path parsing on the return value of the path_analysis_detail function. The pad_full_path function can also parse the full path, subpath, and time consumption of each event. This helps business personnel observe paths more intuitively.

  • Syntax

    pad_full_path(path_analysis_detail())
  • Parameters

    path_analysis_detail(): the array of aggregated results of the path details returned by the path_analysis_detail function.

  • Return value

    • The following code shows the result returned by the pad_full_path function:

      pad_sub_path_left(unnested_pad_result)
      pad_sub_path_right(unnested_pad_result)
      pad_sub_index_left(unnested_pad_result)
      pad_sub_index_right(unnested_pad_result)
      pad_sub_cost(unnested_pad_result)
      pad_sub_session(unnested_pad_result)
    • The following table describes the parameters in the preceding code.

      Parameter

      Data type

      Description

      pad_sub_path_left

      text

      The start event of the subpath.

      pad_sub_path_right

      text

      The end event of the subpath.

      pad_sub_index_left

      bigint

      The sequence number of the start event of the subpath in the full path of the session to which the subpath belongs.

      pad_sub_index_right

      bigint

      The sequence number of the end event of the subpath in the full path of the session to which the subpath belongs.

      pad_sub_cost

      bigint

      The redirection duration of the subpath. Unit: seconds.

      pad_sub_session

      bigint

      The sequence number of the valid session to which the subpath belongs.

pad_session_path_array

The pad_session_path_array function can accurately extract the event series in a session based on the specified session ID and organize the event series in order by path prefix.

  • Syntax

    pad_session_path_array(path_analysis_detail(), session_idx)
  • Parameters

    • path_analysis_detail(): the array of aggregated results of the path details returned by the path_analysis_detail function.

    • session_idx: the session ID.

  • Return value

    A value of the ARRAY type is returned.

Examples

Prepare data

-- Create an extension. An extension is created at the database level. For each database, you need to create an extension only once.
CREATE extension flow_analysis;
-- Prepare data.
CREATE TABLE path_demo( 
    uid text,
    event text,
    event_time timestamptz
);

INSERT INTO path_demo VALUES
('1','Register','2023-11-24 16:01:23+08'),
('1','Log on','2023-11-24 16:02:10+08'),
('1','Browse','2023-11-24 16:02:15+08'),
('1','View live streams','2023-11-24 16:03:10+08'),
('1','Browse','2023-11-24 16:03:15+08'),
('1','Add to favorites','2023-11-24 16:04:20+08'),
('1','Browse','2023-11-24 16:07:21+08'),
('1','Purchase','2023-11-24 16:08:23+08'),
('1','Exit','2023-11-24 16:09:05+08'),

('2','Log on','2023-11-24 16:10:23+08'),
('2','Purchase','2023-11-24 16:12:23+08'),

('3','Log on','2023-11-24 16:02:23+08'),
('3','Browse','2023-11-24 16:02:23+08'),
('3','Add to favorites','2023-11-24 16:03:53+08'),
('3','View live streams','2023-11-24 16:04:53+08'),

('4','Log on','2023-11-24 16:02:23+08'),
('4','Browse','2023-11-24 16:03:53+08'),
('4','Purchase','2023-11-24 16:04:23+08'),
('4','View live streams','2023-11-24 16:05:53+08'),
('4','Cancel the order','2023-11-24 16:06:53+08');

Example 1: Obtain all event paths

  • Split a session based on the time: Specify the start event, set the session_interval_sec parameter to 180 seconds, and then set the path_depth parameter to 7.

    -- Split a session based on the time: Specify "Log on" as the start event, set the session_interval_sec parameter to 180 seconds, set the path_depth parameter to 7, and then use the pad_full_path function to parse the result.
    SELECT uid, pad_full_path(path_analysis_detail(event, event_time, 'Log on', 180, 7, 0, false)) AS ret FROM path_demo GROUP BY uid;

    The following code shows the returned result:

     uid |                        ret
    -----+---------------------------------------------------
     3   | {Log on -> Add to favorites -> View live streams}
     4   | {Log on -> Browse-> Purchase -> View live streams -> Cancel the order}
     1   | {Log on -> Browse-> View live streams -> Browse -> Add to favorites}
     2   | {Log on -> Purchase}
    (4 rows)
  • Split a session based on the time and an event: Specify the start event, set the session_interval_sec parameter to 180 seconds, and then set the path_depth parameter to 7.

    -- Split a session based on the time and an event: Specify "Browse" as the start event, set the session_interval_sec parameter to 180 seconds, set the path_depth parameter to 7, and then use the pad_full_path function to parse the result.
    SELECT uid, pad_full_path(path_analysis_detail(event, event_time, 'Browse', 180, 7, 0, false,TRUE)) AS ret FROM path_demo GROUP BY uid;

    The following code shows the returned result:

     uid |                            ret
    -----+-----------------------------------------------------------
     1   | {Browse -> View live streams, Browse -> Add to favorites, Browse -> Purchase -> Exit}
     2   | {}
     4   | {Browse -> Purchase -> View live streams -> Cancel the order}
     3   | {Browse -> Log on -> Add to favorites -> View live streams}

Example 2: Expand the path result

-- Expand paths.
SELECT uid, unnest(pad_full_path(path_analysis_detail(event, event_time, 'Log on', 180, 7, 0, false))) AS ret FROM path_demo GROUP BY uid;
Note

For more information about the UNNEST clause, see UNNEST clause.

The following code shows the returned result:

 uid |                       ret
-----+-------------------------------------------------
 3   | Log on -> Add to favorites -> View live streams
 1   | Log on -> Browse -> View live streams -> Browse -> Add to favorites
 2   | Log on -> Purchase
 4   | Log on -> Browse -> Purchase -> View live streams -> Cancel the order
(4 rows)

Example 3: Expand subpaths and obtain the path details of each event

-- Expand subpaths.
SELECT
    uid,
    pad_sub_session (ret) AS session_id,
    pad_sub_path_left (ret) AS sub_path_left,
    pad_sub_path_right (ret) AS sub_path_right,
    pad_sub_index_left (ret) AS sub_index_left,
    pad_sub_index_right (ret) AS sub_index_right,
    pad_sub_cost (ret) AS sub_cost
FROM (
    SELECT
        uid,
        unnest( path_analysis_detail (event, event_time, 'Log on', 180, 7, 0, FALSE)) AS ret
    FROM
        path_demo 
    GROUP BY
        uid) a ;

The following code shows the returned result:

  uid | session_id | sub_path_left | sub_path_right | sub_index_left | sub_index_right | sub_cost
-----+------------+---------------+----------------+----------------+-----------------+----------
 1   |          0 |               | Log on         |             -1 |               0 |        0
 1   |          0 | Log on        | Browse         |              0 |               1 |        5
 1   |          0 | Browse        | View live streams      |              1 |               2 |       55
 1   |          0 | View live streams     | Browse         |              2 |               3 |        5
 1   |          0 | Browse        | Add to favorites         |              3 |               4 |       65
 2   |          0 |               | Log on         |             -1 |               0 |        0
 2   |          0 | Log on        | Purchase         |              0 |               1 |      120
 3   |          0 |               | Log on         |             -1 |               0 |        0
 3   |          0 | Log on        | Add to favorites         |              0 |               1 |       90
 3   |          0 | Add to favorites        | View live streams      |              1 |               2 |       60
 4   |          0 |               | Log on         |             -1 |               0 |        0
 4   |          0 | Log on        | Browse         |              0 |               1 |       90
 4   |          0 | Browse        | Purchase         |              1 |               2 |       30
 4   |          0 | Purchase        | View live streams      |              2 |               3 |       90
 4   |          0 | View live streams     | Cancel the order   |              3 |               4 |       60
(15 rows)

Example 4: Obtain the event series in a specific session

SELECT
    uid,
    pad_session_path_array (path_analysis_detail (event, event_time, 'Log on', 180, 7, 0,FALSE), 0) AS ret
FROM
    path_demo
GROUP BY
    uid;

The following code shows the returned result:

 uid |                      ret
-----+-----------------------------------------------
 1   | {Log on,Browse,View live streams,Browse,Add to favorites}
 2   | {Log on,Purchase}
 3   | {Log on,Add to favorites,View live streams}
 4   | {Log on,Browse,Purchase,View live streams,Cancel the order}
(4 rows)

Example 5: Calculate the PVs and UVs in each subpath without deduplication

-- Calculate the page views (PVs) and unique visitors (UVs) in each subpath without deduplication. If deduplication is required, you can deduplicate data based on the value of uid.
SELECT
    sub_index,
    sub_path_left,
    sub_path_right,
    count(uid)
FROM (
    SELECT
        uid,
        pad_sub_path_left (ret) AS sub_path_left,
        pad_sub_path_right (ret) AS sub_path_right,
        pad_sub_index_right (ret) AS sub_index
    FROM (
        SELECT
            uid,
            unnest(path_analysis_detail (event, event_time, 'Log on', 180, 7, 0, FALSE)) AS ret
        FROM
            path_demo
        GROUP BY
            uid) a) a
GROUP BY
    sub_index,
    sub_path_left,
    sub_path_right
ORDER BY
    sub_index,
    sub_path_left,
    sub_path_right;

The following code shows the returned result:

 sub_index | sub_path_left | sub_path_right | count
-----------+---------------+----------------+-------
         0 |               | Log on         |     4
         1 | Log on        | Add to favorites         |     1
         1 | Log on        | Browse         |     2
         1 | Log on        | Purchase         |     1
         2 | Add to favorites        | View live streams      |     1
         2 | Browse        | View live streams      |     1
         2 | Browse        | Purchase         |     1
         3 | View live streams     | Browse         |     1
         3 | Purchase        | View live streams      |     1
         4 | Browse        | Add to favorites         |     1
         4 | View live streams     | Cancel the order   |     1
(11 rows)

Example 6: Calculate the average duration for each subpath

-- Calculate the average duration for each subpath.
SELECT
    sub_path_left,
    sub_path_right,
    avg(sub_cost)
FROM (
    SELECT
        uid,
        pad_sub_path_left (ret) AS sub_path_left,
        pad_sub_path_right (ret) AS sub_path_right,
        pad_sub_cost (ret) AS sub_cost
    FROM (
        SELECT
            uid,
            unnest(path_analysis_detail (event, event_time, 'Log on', 180, 7, 0, FALSE)) AS ret
        FROM
            path_demo
        GROUP BY
            uid) a) a
GROUP BY
    sub_path_left,
    sub_path_right
ORDER BY
    sub_path_left,
    sub_path_right;

The following code shows the returned result:

 sub_path_left | sub_path_right |    avg
---------------+----------------+------------
 Add to favorites        | View live streams      |  60.000000
 Browse        | Add to favorites         |  65.000000
 Browse        | View live streams      |  55.000000
 Browse        | Purchase         |  30.000000
 Log on        | Add to favorites         |  90.000000
 Log on        | Browse         |  47.500000
 Log on        | Purchase         | 120.000000
 View live streams     | Cancel the order   |  60.000000
 View live streams     | Browse         |   5.000000
 Purchase        | View live streams      |  90.000000
               | Log on         |   0.000000
(11 rows)

Example 7: Query the association between the full path and subpaths in a session

-- Query the association between the full path and subpaths in a session.
select
    uid,
    pad_sub_session(item) as session_id,
    full_path [pad_sub_session(item)+1] as full_path,
    pad_sub_path_left(item) as sub_path_left,
    pad_sub_path_right(item) as sub_path_right,
    pad_sub_index_right(item) as sub_idx,
    pad_sub_cost(item) as sub_cost
from
    (
        select
            uid,
            unnest(ret) as item,
            pad_full_path(ret) as full_path
        from
            (
                select
                    uid,
                    path_analysis_detail(event, event_time, 'Log on', 180, 7, 0, false) as ret
                from
                    path_demo
                group by
                    uid
            ) a
    ) a;

The following code shows the returned result:

uid | session_id |                    full_path                    | sub_path_left | sub_path_right | sub_idx | sub_cost
-----+------------+-------------------------------------------------+---------------+----------------+---------+----------
 3   |          0 | Log on -> Add to favorites -> View live streams                       |               | Log on         |       0 |        0
 3   |          0 | Log on -> Add to favorites -> View live streams                       | Log on        | Add to favorites         |       1 |       90
 3   |          0 | Log on -> Add to favorites -> View live streams                       | Add to favorites        | View live streams      |       2 |       60
 1   |          0 | Log on -> Browse -> View live streams -> Browse -> Add to favorites      |               | Log on         |       0 |        0
 1   |          0 | Log on -> Browse -> View live streams -> Browse -> Add to favorites       | Log on        | Browse         |       1 |        5
 1   |          0 | Log on -> Browse -> View live streams -> Browse -> Add to favorites       | Browse        | View live streams      |       2 |       55
 1   |          0 | Log on -> Browse -> View live streams -> Browse -> Add to favorites       | View live streams     | Browse         |       3 |        5
 1   |          0 | Log on -> Browse -> View live streams -> Browse -> Add to favorites       | Browse        | Add to favorites         |       4 |       65
 2   |          0 | Log on -> Purchase                                  |               | Log on         |       0 |        0
 2   |          0 | Log on -> Purchase                                  | Log on        | Purchase         |       1 |      120
 4   |          0 | Log on -> Browse -> Purchase -> View live streams -> Cancel the order |               | Log on         |       0 |        0
 4   |          0 | Log on -> Browse -> Purchase -> View live streams -> Cancel the order | Log on        | Browse         |       1 |       90
 4   |          0 | Log on -> Browse -> Purchase -> View live streams -> Cancel the order | Browse        | Purchase         |       2 |       30
 4   |          0 | Log on -> Browse -> Purchase -> View live streams -> Cancel the order | Purchase        | View live streams      |       3 |       90
 4   |          0 | Log on -> Browse -> Purchase -> View live streams -> Cancel the order | View live streams     | Cancel the order   |       4 |       60
(15 rows)

Example 8: View the details of a specified path

-- You can use the pad_tunnel function to view the details of a specified path. In this example, details or subpath information of the Browse > Purchase path is queried.
SELECT uid, pad_full_path(pad_funnel(path_analysis_detail(event, event_time, 'Log on', 180, 7, 0, false), array['Log on', 'Purchase'])) AS ret FROM path_demo GROUP BY uid;

The following code shows the returned result:

 uid |       ret
-----+------------------
 3   | {Log on}
 4   | {Log on -> Purchase}
 1   | {Log on}
 2   | {Log on -> Purchase}
(4 rows)