Funnel analysis is a common conversion analysis method and is widely used in data operations and analysis scenarios, such as the analysis of user behavior, the traffic analysis of app data, and analysis of the product goal conversion. Administrators or operations personnel can use funnel analysis to measure the performance at each stage based on the conversion rate. This way, they can optimize products to increase conversion rates.
Background information
An event represents a specific or series of meaningful behavior of a user, such as downloading, registering with, and logging on to a game app. The actual usage process of the user can be restored by analyzing various behavioral data of the user. This improves product conversion rates and helps business growth. Common user behavior analysis includes event analysis, funnel analysis, and retention analysis. Funnel analysis is an analysis model used to analyze the conversion of user behavior at specific stages. It can help you analyze the conversion rate of user behavior at each stage, and then help you measure the performance at each stage based on the conversion rate. This way, you can take specific measures to optimize user experience on products to increase conversion rates.
Hologres is a one-stop real-time data warehousing service developed by Alibaba Cloud. It supports multi-dimensional analysis of real-time data in multiple scenarios. In user behavior analysis scenarios, Hologres provides multiple funnel functions to help you quickly and efficiently analyze user behavior. The functions are widely used in user analysis scenarios in industries such as the Internet, e-commerce, and gaming.
Limits
Only Hologres V0.9 and later support the windowFunnel function.
Only Hologres V2.1 and later support the range_funnel function.
To use funnel functions, you must execute the following statement as a superuser to install an extension:
CREATE extension flow_analysis; -- Install an extension.
An extension is installed at the database level. For each database, you need to install an extension only once.
By default, the extension is installed in the public schema. The extension cannot be installed in other schemas.
Scenario description
The examples in this topic are based on the public event datasets on GitHub.
Dataset
A large number of developers develop open source projects on GitHub and generate a large number of events during the development of the projects. GitHub records the type and details of each event, the developer, and the code repository. GitHub also exposes public events, such as the events generated when you add items to your favorites or submit code. For more information about event types, see Webhook events and payloads.
Dataset import
You can use Hologres to import the github_event dataset to a Hologres instance with only a few clicks. For more information, see Import public datasets with a few clicks.
Statements for creating a table for GitHub public events:
BEGIN;
CREATE TABLE hologres_dataset_github_event.hologres_github_event (
id bigint,
actor_id bigint,
actor_login text,
repo_id bigint,
repo_name text,
org_id bigint,
org_login text,
type text,
created_at timestamp with time zone NOT NULL,
action text,
iss_or_pr_id bigint,
number bigint,
comment_id bigint,
commit_id text,
member_id bigint,
rev_or_push_or_rel_id bigint,
ref text,
ref_type text,
state text,
author_association text,
language text,
merged boolean,
merged_at timestamp with time zone,
additions bigint,
deletions bigint,
changed_files bigint,
push_size bigint,
push_distinct_size bigint,
hr text,
month text,
year text,
ds text
);
CALL set_table_property('hologres_dataset_github_event.hologres_github_event', 'orientation', 'column');
CALL set_table_property('hologres_dataset_github_event.hologres_github_event', 'bitmap_columns', 'actor_login,repo_name,org_login,type,action,commit_id,ref,ref_type,state,author_association,language,hr,month,year,ds');
CALL set_table_property('hologres_dataset_github_event.hologres_github_event', 'clustering_key', 'created_at:asc');
CALL set_table_property('hologres_dataset_github_event.hologres_github_event', 'dictionary_encoding_columns', 'actor_login:auto,repo_name:auto,org_login:auto,type:auto,action:auto,commit_id:auto,ref:auto,ref_type:auto,state:auto,author_association:auto,language:auto,hr:auto,month:auto,year:auto,ds:auto');
CALL set_table_property('hologres_dataset_github_event.hologres_github_event', 'distribution_key', 'id');
CALL set_table_property('hologres_dataset_github_event.hologres_github_event', 'segment_key', 'created_at');
CALL set_table_property('hologres_dataset_github_event.hologres_github_event', 'time_to_live_in_seconds', '3153600000');
COMMENT ON TABLE hologres_dataset_github_event.hologres_github_event IS NULL;
ALTER TABLE hologres_dataset_github_event.hologres_github_event OWNER TO test1_developer;
END;
windowFunnel function
Funnel analysis is an analysis model used to analyze the conversion of users at specified stages. It can help you analyze the conversion rate of user behavior at each stage, and then help administrators or operations personnel measure the performance at each stage by using the conversion rate. This way, they can take specific measures to optimize user experience on products to increase conversion rates.
Function description
You can use the windowFunnel function to query events from a sliding time window. This function calculates the maximum number of events that match the query conditions.
This function returns the maximum number of ordered and matched events starting from the first event in the event list. If a match fails, the entire match ends.
Examples in which the time window is long enough:
The function specifies events c1, c2, and c3, and the user data indicates c1, c2, c3, and c4. In this case, events c1, c2, and c3 are matched. The function returns 3.
The function specifies events c1, c2, and c3, and the user data indicates c4, c3, c2, and c1. In this case, the event c1 is matched. The function returns 1.
The function specifies events c1, c2, and c3, and the user data indicates c4 and c3. In this case, no events are matched. The function returns 0.
The following code describes the syntax of the windowFunnel function:
windowFunnel(window, mode, timestamp, cond1, cond2, ..., condN)
The following table describes the parameters in the preceding syntax.
Parameter | Description |
window | The length of the time window. The windowFunnel function uses the first event in the specified conditions as the start point of the time window. Then, this function determines the event list based on the length of the time window. |
mode | The working mode of the function. Two modes are supported: default and strict. Default value: default.
|
timestamp | The timestamp when an event occurred. Supported data types are TIMESTAMP, INT, and BIGINT. |
cond | The event type, which indicates each step of the event. |
Examples
The data in the public event datasets on GitHub in the preceding scenario description is used in this example. If you want to analyze the funnel data in which user events match the order of the specified events within a specified period of time, you can specify the following conditions in the SQL statement:
Time window: 30 minutes, which equals 1,800 seconds.
Time period: 2023-07-28 10:00:00+08 to 2023-07-31 10:00:00+08.
Event order: CreateEvent > PushEvent > IssuesEvent.
-- Analyze the funnel data of each user.
SELECT
actor_id,
windowFunnel (1800, 'default', created_at, type = 'CreateEvent',type = 'PushEvent',type = 'IssuesEvent') AS level
FROM
hologres_dataset_github_event.hologres_github_event
WHERE
created_at >= TIMESTAMP '2023-07-28 10:00:00+08'
AND created_at < TIMESTAMP '2023-07-31 10:00:00+08'
GROUP BY
actor_id
The following content is a part of the returned result, where:
level=0 indicates that the first event is not matched for the user within the time window.
level=1 indicates that the first event is matched for the user within the time window.
level=2 indicates that the second event is matched for the user within the time window.
level=3 indicates that all three events are matched for the user within the time window.
actor_id | level
----------+------
69438111 | 0
62698183 | 0
139085682 | 1
1765075 | 1
132836353 | 2
122081842 | 2
89815869 | 3
To improve result readability, you can execute the following SQL statement to view the number of users converted at each stage:
WITH level_detail AS (
SELECT
level,
COUNT(1) AS count_user
FROM (
SELECT
actor_id,
windowFunnel (1800, 'default', created_at, type = 'CreateEvent', type = 'PushEvent',type = 'IssuesEvent') AS level
FROM
hologres_dataset_github_event.hologres_github_event
WHERE
created_at >= TIMESTAMP '2023-07-28 10:00:00+08'
AND created_at < TIMESTAMP '2023-07-31 10:00:00+08'
GROUP BY
actor_id) AS basic_table
GROUP BY
level
ORDER BY
level ASC
)
SELECT CASE level WHEN 0 THEN 'total'
WHEN 1 THEN 'CreateEvent'
WHEN 2 THEN 'PushEvent'
WHEN 3 THEN 'IssuesEvent'
END
,SUM(count_user) over ( ORDER BY level DESC )
FROM
level_detail
GROUP BY
level,
count_user
ORDER BY
level ASC;
The following result is returned:
case | sum
------------+------
total | 864120
CreateEvent | 275053
PushEvent | 120242
IssuesEvent | 2652
Range funnel functions
When you use Hologres to perform funnel analysis, you need to collect grouping statistics in most cases, such as statistics by day, statistics by hour, and statistics by another custom time window. To better meet business requirements, Hologres V2.1 and later introduce the additional function range_funnel based on the windowFunnel function. The range_funnel and windowFunnel functions differ in the following aspects:
The windowFunnel function can aggregate the input event data only once, and the result in the specified time window is complete. The range_funnel function can return both the aggregation result for the complete time period and the grouping statistics for a custom time window. The result is an array.
The windowFunnel function does not support the extraction of multiple identical events, whereas the range_funnel function supports extraction of multiple identical events. The following section describes the matching logic of the range_funnel function:
If the function specifies events c1, c2, and c3 and the user data indicates c1, c2, c1, and c3, the function returns 3.
If the function specifies events c1, c1, and c1 and the user data indicates c1, c2, c1, and c3, the function returns 2.
range_funnel function
Only Hologres V2.1 and later support the function.
Syntax
range_funnel(window, event_size, range_begin, range_end, interval, event_ts, event_bits, use_interval_window, mode)
Parameters
Parameter
Data type
Description
window
interval
The length of the time window. The range_funnel function uses the first event in the specified conditions as the start point of the time window. Then, this function determines the event list based on the length of the time window. Unit: seconds.
If you set the window parameter to 0, the function performs truncation according to the start point and end point of each time period. If the truncation result is 0:00 every day, data is filtered by a calendar day.
If you set the use_interval_window parameter to false, the original semantics are used. Unit: seconds.
If you set the use_interval_window parameter to true, n time periods (including the current time period) are included in the window. You can also set the window parameter in the unit of day to filter data by calendar days.
event_size
int
The total number of events that you want to analyze.
range_begin
timestamptz/timestamp/date
The start time of the time period for analysis, calculated from the first event.
range_end
timestamptz/timestamp/date
The end time of the time period for analysis, calculated from the first event.
interval
interval
The interval at which the time period for analysis is divided into multiple consecutive intervals. Then, window funnel analysis is performed on each interval to obtain results. Unit: seconds.
event_ts
timestamp/timestamptz
The time when an event occurred. The TIMESTAMP and TIMESTAMPTZ data types are supported.
NoteThis parameter is calculated from 00:00. The time may differ from the actual time. In most cases, this parameter is used to analyze the trend by day or week. Therefore, the actual time can be ignored.
event_bits
bitmap
The event type. The value must be a bitmap of the INT32 type. An event is represented by an order from the least significant bit to the most significant bit. Therefore, the window funnel analysis supports up to 32 events.
use_interval_window
text
Specifies whether to use the interval boundary computing window. Default value: false.
NoteOnly Hologres V2.2.30 and later support this parameter.
mode
text
If you set the
mode
parameter to 0, only one event is randomly extracted from events that occur at the same time and counted as a conversion, and the other events are discarded. This is the default value.If you set the
mode
parameter to 1, different events that occur at the same time are counted as different conversions.
NoteAn event that occurs at a time is counted as one conversion.
Only Hologres V2.2.30 and later support this parameter.
Returned result
The range_funnel function returns an array of the INT64 type, which corresponds to BIGINT[] in SQL. Note that the result of the array is encoded. It is displayed by interval and consists of the start time of an interval and the number of extracted events. The length for the start time of an interval is 56 bits, and the length for the number of extracted events is 8 bits. Therefore, after the result is obtained, the content of the array must be decoded to obtain the final matching data.
The results returned by the range_funnel function are encoded. You need to execute SQL statements to decode the results. Hologres V2.1.6 and later introduce the range_funnel_time and range_funnel_level functions for you to decode results returned by the range_funnel function.
Range funnel decoding functions
Syntax
range_funnel_time(range_funnel()) range_funnel_level(range_funnel())
Functions
Function
Description
Input parameter
Output parameter
range_funnel_time
Decodes the event time in the result returned by the range_funnel function. The result is of the INT64 type.
The result of the INT64 type that is returned by the range_funnel function.
The event time of the TIMESTAMPTZ type.
range_funnel_level
Decodes the event level in the result returned by the range_funnel function. The result is of the INT64 type.
The result of the INT64 type that is returned by the range_funnel function.
The event level of the BIGINT type.
Examples
Display funnel information by day
The data in the public event datasets on GitHub in the preceding scenario description is used in this example. If you want to analyze the funnel data in which user events match the order of the specified events within a specified period of time and view the grouping result by day, you can specify the following conditions in the SQL statement:
Time window: 1 hour, which equals 3,600 seconds.
Time period: 2023-07-23 to 2023-07-25, 3 days in total
Event order: CreateEvent > PushEvent.
Grouping interval: 1 day, which equals 86,400 seconds. The window funnel analysis results are displayed by day.
The value of the type field is of the TEXT type. The value of the event_bits field in the range_funnel function must be a 32-bit bitmap. Therefore, you must use the bit_construct function to convert the value of the type field into a bitmap.
-- Result without decoding
SELECT
actor_id,
range_funnel (3600, 2, '2023-07-23', '2023-07-25', 86400, created_at::timestamp, bits) AS result
FROM (
SELECT
actor_id,
created_at::timestamp,
type,
bit_construct (a := type = 'CreateEvent', b := type = 'PushEvent') AS bits
FROM
hologres_dataset_github_event.hologres_github_event WHERE ds >= '2023-07-23' AND ds <='2023-07-25') tt GROUP BY actor_id ORDER BY actor_id ;
The following content is a part of the returned result:
actor_id | result
----------+------
17 |{432658022400,432680140800,432702259202,9223372036854775554}
47 |{432658022402,432680140800,432702259200,9223372036854775554}
235 |{432658022401,432680140800,432702259200,9223372036854775553}
Description of the result field:
If the value of the result field is empty, the behavior of the user does not match the condition within any interval.
If the value of the result field contains data, the data is an undecoded array that includes the overall funnel result by day.
In this example, the range_funnel_time and range_funnel_level functions are used to decode the result that is returned by the range_funnel function in the preceding example. The decoding results are displayed by user ID. Sample SQL statement:
SELECT actor_id,
to_timestamp(range_funnel_time(result)) AS res_time, -- Decode the event time.
range_funnel_level(result) AS res_level -- Decode the event level.
FROM (
SELECT actor_id, result, COUNT(1) AS cnt FROM (
SELECT actor_id,
UNNEST(range_funnel (3600, 2, '2023-07-23', '2023-07-26', 86400, created_at::timestamp, bits)) AS result FROM (
SELECT actor_id, created_at::timestamp, type, bit_construct (a := type = 'CreateEvent', b := type = 'PushEvent') AS bits from hologres_dataset_github_event.hologres_github_event where ds >= '2023-07-23' AND ds <='2023-07-25'
) a
GROUP BY actor_id
) a
GROUP BY actor_id ,result
) a
ORDER BY actor_id ,res_time limit 10000;
The following content is a part of the returned result, from which you can obtain the level and number of matches per user per day:
actor_id | res_time | res_level
----------------+-----------------------+-----------
17 |2023-07-23 08:00:00 | 0
17 |2023-07-24 08:00:00 | 0
17 |2023-07-24 08:00:00 | 2
17 |\N | 0
47 |2023-07-23 08:00:00 | 0
47 |2023-07-24 08:00:00 | 0
47 |2023-07-25 08:00:00 | 2
47 |\N | 2
235 |2023-07-23 08:00:00 | 1
235 |2023-07-24 08:00:00 | 0
235 |2023-07-25 08:00:00 | 0
235 |\N | 1
After you obtain the funnel result of each user by day, you can further explore the data based on your business requirements. For example, you can execute the following SQL statement to view the daily step size summary and total summary data, where the next level contains the data of the previous level:
SELECT res_time, res_level, SUM(cnt) OVER (PARTITION BY res_time ORDER BY res_level DESC) AS res_cnt FROM (
SELECT
to_timestamp(range_funnel_time(result)) AS res_time, -- Decode the event time.
range_funnel_level(result) AS res_level, -- Decode the event level.
cnt
FROM (
SELECT result, COUNT(1) AS cnt FROM (
SELECT actor_id,
UNNEST(range_funnel (3600, 2, '2023-07-23', '2023-07-26', 86400, created_at::timestamp, bits)) AS result FROM (
SELECT actor_id, created_at::timestamp, type, bit_construct (a := type = 'CreateEvent', b := type = 'PushEvent') AS bits FROM hologres_dataset_github_event.hologres_github_event WHERE ds >= '2023-07-23' AND ds <='2023-07-25'
) a
GROUP BY actor_id
) a
GROUP BY result
) a
)a
WHERE res_level > 0
GROUP BY res_time, res_level, cnt ORDER BY res_time, res_level;
The following content shows the query result:
\N
indicates the result aggregated for multiple days.The
cnt
field includes the summary data of each level. The next level contains the data of the previous level. For example, if the value of the res_level field is 2 and the value of the cnt field is 1, only one user goes through steps 1 and 2.
res_time |res_level | cnt
------------------------+---------------+------
2023-07-23 08:00:00 |1 |114455
2023-07-23 08:00:00 |2 |56155
2023-07-24 08:00:00 |1 |154616
2023-07-24 08:00:00 |2 |71200
2023-07-25 08:00:00 |1 |178662
2023-07-25 08:00:00 |2 |81779
\N |1 |390262
\N |2 |188125
Count different events that occur at the same time as different conversions
If you set the mode
parameter in the range_funnel function to 1, different events that occur at the same time are counted as different conversions. Sample statements:
CREATE extension flow_analysis;
CREATE TABLE funnel_test (
uid int,
event text,
create_time timestamptz
);
INSERT INTO funnel_test VALUES
(11, 'login', '2024-09-26 16:15:28+08'),
(11, 'watch', '2024-09-26 16:15:28+08'),
(11, 'buy', '2024-09-26 16:16:28+08'),
(22, 'login', '2024-09-26 16:15:28+08'),
(22, 'watch', '2024-09-26 16:16:28+08'),
(22, 'buy', '2024-09-26 16:17:28+08');
SELECT res_time, res_level, SUM(cnt) OVER (PARTITION BY res_time ORDER BY res_level DESC) AS res_cnt FROM (
SELECT
to_timestamp(range_funnel_time(result)) AS res_time, -- Decode the event time.
range_funnel_level(result) AS res_level, -- Decode the event level.
cnt
FROM (
SELECT result, COUNT(1) AS cnt FROM (
SELECT uid,
UNNEST(range_funnel (3600, 3, '2024-09-26', '2024-09-27', 86400, create_time::timestamp, bits,false,'1')) AS result FROM (
SELECT uid, create_time::timestamp, event, bit_construct (a := event = 'login', b := event = 'watch',c := event = 'buy') AS bits FROM funnel_test
) a
GROUP BY uid
) a
GROUP BY result
) a
)a
GROUP BY res_time, res_level, cnt ORDER BY res_time, res_level;
The following output shows that different events that occur at the same time are counted as different conversions.
res_time | res_level | res_cnt
------------------------+-----------+---------
2024-09-26 08:00:00+08 | 3 | 2
| 3 | 2
(2 rows)
Perform statistics by day with the time window set to multiple calendar days
In actual scenarios, you may need to analyze conversion rates across intervals. In this case, you can set the use_interval_window
parameter in the range_funnel function to true. In this example, the time window is set to multiple calendar days.
-- Perform statistics by day with the time window set to multiple calendar days.
CREATE TABLE funnel_test_2 (
uid int,
event text,
create_time timestamptz
);
INSERT INTO funnel_test_2 VALUES
(11, 'login', '2024-09-24 16:15:28+08'),
(11, 'watch', '2024-09-25 16:15:28+08'),
(11, 'buy', '2024-09-26 16:16:28+08'),
(22, 'login', '2024-09-24 16:15:28+08'),
(22, 'watch', '2024-09-25 16:16:28+08'),
(22, 'buy', '2024-09-26 16:17:28+08');
Calculate conversion rates when the time window is set to multiple calendar days.
-- The time window is set to three calendar days.
SELECT
to_timestamp(range_funnel_time(result)) AS res_time, -- Decode the event time.
range_funnel_level(result) AS res_level, -- Decode the event level.
cnt
FROM (
SELECT result, count(1) AS cnt FROM (
SELECT uid,
unnest(range_funnel (3, 3, '2024-09-24', '2024-09-27', 86400, create_time::timestamp, bits,true,'1')) AS result FROM (
SELECT uid, create_time::timestamp, event, bit_construct (a := event = 'login', b := event = 'watch',c := event = 'buy') AS bits FROM funnel_test_2
) a
GROUP BY uid
) a
GROUP BY result
) a;
The following output shows the daily conversion status.
res_time | res_level | cnt
------------------------+-----------+-----
2024-09-26 08:00:00+08 | 0 | 2
| 3 | 2
2024-09-24 08:00:00+08 | 3 | 2
2024-09-25 08:00:00+08 | 0 | 2
(4 rows)