The basic funnel function windowFunnel can be used to calculate the funnel results of events within a specific time window. This topic describes how to use this function.
Limits
Only Hologres V0.9 and later support the windowFunnel function.
Precautions
To use funnel functions, you must execute the following statement as a superuser to install an extension:
CREATE extension flow_analysis; --Install the extension.
The extension is installed at the database level. For each database, you need to install the extension only once.
By default, the extension is loaded to the public schema. The extension cannot be loaded to other schemas. To use the extension in other schemas, you must specify a function name in the public.function format, such as
public.windowFunnel
.
Scenario description
All examples in this topic are based on the public event datasets on GitHub.
Dataset description
A large number of developers develop open source projects on GitHub and generate many events during the development process. GitHub records the type and details of each event, developers, code repositories, and other information. It opens public events, such as the events generated when you add items to your favorites or submit code. For more information about event types, see Webhook events and payloads.
Dataset import
You can use Hologres to import the github_event dataset into a Hologres instance with only a few clicks. For more information, see Import public datasets with a few clicks.
Statements for creating a table for GitHub public events:
BEGIN;
CREATE TABLE hologres_dataset_github_event.hologres_github_event (
id BIGINT,
actor_id BIGINT,
actor_login TEXT,
repo_id BIGINT,
repo_name TEXT,
org_id BIGINT,
org_login TEXT,
type TEXT,
created_at TIMESTAMP WITH TIME ZONE NOT NULL,
action TEXT,
iss_or_pr_id BIGINT,
number BIGINT,
comment_id BIGINT,
commit_id TEXT,
member_id BIGINT,
rev_or_push_or_rel_id BIGINT,
ref TEXT,
ref_type TEXT,
state TEXT,
author_association TEXT,
language TEXT,
merged BOOLEAN,
merged_at TIMESTAMP WITH TIME ZONE,
additions BIGINT,
deletions BIGINT,
changed_files BIGINT,
push_size BIGINT,
push_distinct_size BIGINT,
hr TEXT,
month TEXT,
year TEXT,
ds TEXT
);
CALL set_table_property('hologres_dataset_github_event.hologres_github_event', 'orientation', 'column');
CALL set_table_property('hologres_dataset_github_event.hologres_github_event', 'bitmap_columns', 'actor_login,repo_name,org_login,type,action,commit_id,ref,ref_type,state,author_association,language,hr,month,year,ds');
CALL set_table_property('hologres_dataset_github_event.hologres_github_event', 'clustering_key', 'created_at:asc');
CALL set_table_property('hologres_dataset_github_event.hologres_github_event', 'dictionary_encoding_columns', 'actor_login:auto,repo_name:auto,org_login:auto,type:auto,action:auto,commit_id:auto,ref:auto,ref_type:auto,state:auto,author_association:auto,language:auto,hr:auto,month:auto,year:auto,ds:auto');
CALL set_table_property('hologres_dataset_github_event.hologres_github_event', 'distribution_key', 'id');
CALL set_table_property('hologres_dataset_github_event.hologres_github_event', 'segment_key', 'created_at');
CALL set_table_property('hologres_dataset_github_event.hologres_github_event', 'time_to_live_in_seconds', '3153600000');
COMMENT ON TABLE hologres_dataset_github_event.hologres_github_event IS NULL;
END;
Function syntax
You can use the windowFunnel function to query events in a sliding time window and calculate the maximum number of events that match the specified conditions.
This function returns the maximum length of an ordered sequence of events starting from the first event. If a match fails, the function stops matching.
Examples in which the time window is long enough:
The function specifies events c1, c2, and c3, and the user data is c1, c2, c3, and c4. In this case, events c1, c2, and c3 are matched. The function returns 3.
The function specifies events c1, c2, and c3, and the user data is c4, c3, c2, and c1. In this case, the event c1 is matched. The function returns 1.
The function specifies events c1, c2, and c3, and the user data is c4 and c3. In this case, no events are matched. The function returns 0.
Syntax of the windowFunnel function:
windowFunnel(window, mode, timestamp, cond1, cond2, ..., condN)
Parameters:
Parameter | Description |
window | The length of the time window. The windowFunnel function uses the time when the first matched event occurs as the start point of the time window and extracts data of subsequent events based on the length of the time window. |
mode | The mode. Two modes are supported: default and strict. Default value: default.
|
timestamp | The time range of the occurrence of events. The TIMESTAMP, INT, and BIGINT data types are supported. |
cond | The event type, which indicates each stage of the event. |
Examples
The data in the public event datasets on GitHub in the preceding scenario description is used in the examples. If you want to analyze the funnel data of users following a fixed conversion path over a specified period of time, you can specify the following conditions in the SQL statement:
Statistics interval: 1,800 seconds (30 minutes)
Statistics period: from 10:00:00 on January 28, 2024 to 10:00:00 on January 31, 2024 (UTC+8)
User conversion path: CreateEvent > PushEvent > IssuesEvent
--Calculate the funnel data of each user.
SELECT
actor_id,
windowFunnel (1800, 'default', created_at, type = 'CreateEvent',type = 'PushEvent',type = 'IssuesEvent') AS level
FROM
hologres_dataset_github_event.hologres_github_event
WHERE
created_at >= TIMESTAMP '2024-01-28 10:00:00+08'
AND created_at < TIMESTAMP '2024-01-31 10:00:00+08'
GROUP BY
actor_id;
The following content is a part of the returned result, where:
level=0 indicates that the first event is not matched for the user within the time window.
level=1 indicates that the first event is matched for the user within the time window.
level=2 indicates that the second event is matched for the user within the time window.
level=3 indicates that all three events are matched for the user within the time window.
actor_id | level
----------+------
143037332 | 0
38708562 | 0
157624788 | 1
137850795 | 1
69616418 | 2
158019532 | 2
727125 | 3
To improve result readability, you can execute the following SQL statement to view the number of users converted at each stage:
WITH level_detail AS (
SELECT
level,
COUNT(1) AS count_user
FROM (
SELECT
actor_id,
windowFunnel (1800, 'default', created_at, type = 'CreateEvent', type = 'PushEvent',type = 'IssuesEvent') AS level
FROM
hologres_dataset_github_event.hologres_github_event
WHERE
created_at >= TIMESTAMP '2024-01-28 10:00:00+08'
AND created_at < TIMESTAMP '2024-01-31 10:00:00+08'
GROUP BY
actor_id) AS basic_table
GROUP BY
level
ORDER BY
level ASC
)
SELECT CASE level WHEN 0 THEN 'total'
WHEN 1 THEN 'CreateEvent'
WHEN 2 THEN 'PushEvent'
WHEN 3 THEN 'IssuesEvent'
END AS type
,SUM(count_user) over ( ORDER BY level DESC )
FROM
level_detail
GROUP BY
level,
count_user
ORDER BY
level ASC;
The following result is returned:
type | sum
------------+------
total | 1338166
CreateEvent | 461088
PushEvent | 202221
IssuesEvent | 4727