By Yu Li (Hengjiang)
Insight analysis, intelligent user identification, and business reporting are crucial scenarios that require the use of OLAP analytic databases. AnalyticDB for MySQL, a cloud-native data warehouse, has been successfully implemented in various marketing scenarios for platforms like Taobao, Ele.me, Cainiao, Youku, and Hema. Through a series of articles, we will explore the implementation and application of AnalyticDB for MySQL in marketing data products. This article focuses on the implementation and application of Funnel Analysis.
Marketing professionals aim to optimize the user conversion path and regain lost customers by taking specific measures and conducting experiments at different stages of AARRR (Acquisition、Activation, Retention, Revenue and Referral). To achieve this, accurate calculation of user behavioral data at each stage is essential, referred to as funnel transformation. Additionally, high-performance data products are crucial, as no one likes a constantly loading frontend UI. Lack of ad hoc interaction capabilities in data products often leads to complaints and dissatisfaction.
In the past, database products focused primarily on providing standard SQL semantics without considering the implementation of specific business scenarios. Let's explore how to implement a funnel function using standard SQL semantics. Suppose we have a dataset of user behavior that includes comprehensive information about the user, their actions, and the location where these actions occur. The user behavior data table, named "user_behavior," is structured as follows:
There are four types of user behaviors:
When developing a data product, we often encounter two types of funnel requirements. The first is to display data on the homepage of the data report for decision-makers, focusing only on the statistics of each event. This implementation is relatively simple.
select
event_type,
count(distinct uid)
from
user_behavior
where
item_id = 3838928
and ts >= 1511540732
and ts <= 1512312625
group by
event_type
However, this type of funnel can only provide coarse-grained statistical information and cannot analyze the causal relationship and behavior path before and after the event.
For example, the path for a user to purchase a product may involve clicking on the product (pv), adding it to the shopping cart (cart), and finally making a purchase (buy). It could also involve clicking on the product (pv), adding it to favorites (fav), adding it to the shopping cart (cart), and then making a purchase (buy). While we can calculate the number of independent users for each event, we cannot express the information about how many users make direct purchases or add products to their favorites before buying.
To address this, we need to analyze each user's event sequence, such as pv, pv, fav, cart, buy, pv, pv, buy, pv, cart, buy, and determine if they meet specific continuous event sequences like "pv, fav, cart, buy." In practical business scenarios, it is more common to require satisfaction of discontinuous subsequences like pv...fav...cart..buy, as shown in the figure below.
Unfortunately, aggregate functions that provide subsequence matching functionality are typically not supported in database products. Based on the author's understanding, a possible solution is to use string matching functions. This involves converting each user's event sequence into a string and then matching each string with the target sequence string. The SQL pseudocode for this implementation is as follows:
/*Transform the data that meets the conditions of the target event into an event flag, such as e1 , e2 , and e3.*/
with t1 as (
select
uid,
ts,
case event_type when "pv" then "e1"
when "fav" then "e2"
when "cart" then "e3"
when "buy" then "e4"
else "ex" end as event_code
from
user_behavior
)
/*Count the number of users at each level.*/
select
level,
count(distinct uid)
from
(
select
uid,
/*Calculate the maximum number of subsequences for each user.*/
case
when event_lst like "%e1%e2%e3%e4%" then "level_4"
when event_lst like "%e1%e2%e3%" then "level_3"
when event_lst like "%e1%e2%" then "level_2"
when event_lst like "%e1%" then "level_1"
else "level_0" end as level
from
(
/*Aggregate user events into an event sequence.*/
SELECT
uid,
GROUP_CONCAT(
event_code
order by
ts asc
) as event_lst
from
t1
group by
uid
)
)
group by level
There are several performance bottlenecks in the above implementation:
In addition, it can be seen that although this implementation works, the SQL is extremely complex and has not been combined with other user attributes, such as the associated queries of user tag tables, so the expansion capability is limited.
To address the pain points in the previous funnel scenarios, AnalyticDB for MySQL has introduced the window_funnel function, which is defined as follows:
Function description: The window_funnel function allows you to query events from a sliding time window. It calculates the maximum length of the event list that matches the specified conditions. The function starts matching from the first event in the list and continues the longest ordered matching process. If a match fails, the entire matching process ends. Assuming a large enough time window, the following examples illustrate the usage of this function:
• If query events are c1, c2, and c3, and the user data consists of c1, c2, c3, and c4, the function will match events c1, c2, and c3. Therefore, the function will return 3.
• If query events are c1, c2, and c3, and the user data consists of c4, c3, c2, and c1, the function will match event c1. Therefore, the function will return 1.
• If query events are c1, c2, and c3, and the user data consists of c4 and c3, no events will be matched. Therefore, the function will return 0.
Syntax: window_funnel (window, mode, timestamp, cond1, cond2, ..., condN)
Parameter description
Based on the window_funnel function, we implement the funnel computational logic. The SQL statement is as follows:
select
funnel_step,
count(1)
from
(
/*Directly calculate the satisfying behavior sequence for each user.*/
select
uid,
window_funnel(
cast(86400000 /*Standard SQL implementation of semantic alignment is set to 1000 days, and the actual business can be flexibly configured according to the needs*/ as integer),
"default",
ts,
event_type = 'pv',
event_type = 'fav',
event_type = 'cart',
event_type = 'buy'
) as funnel_step
from
user_behavior
group by
uid
)
group by
funnel_step;
As you can see, compared to standard SQL implementation:
Note:
This article discusses the scenario of insight analysis - funnel analysis and how to use the window_funnel function in AnalyticDB for MySQL to implement the funnel computation function. Based on the analysis above, we can observe that compared to the traditional SQL implementation, the window_funnel function can reduce the complexity of SQL queries, provide richer semantics for sliding windows, and offer better query performance. The query performance does not degrade as the funnel level deepens. In scenarios with deep funnel levels, there is a performance improvement of over 10 times, enabling end users to obtain analysis results without waiting.
Learn more: https://www.alibabacloud.com/help/analyticdb-for-mysql/latest/window-funnel-retention-function
Check out Free Tier & Special Offers of Alibaba Cloud Database Products: https://www.alibabacloud.com/product/databases
About Database Kernel | PolarDB Optimizer Query Transformation - Join Condition Pushdown
[Infographic] Highlights | Database New Feature in September 2023
ApsaraDB - March 17, 2023
Alibaba Cloud Community - August 4, 2023
ApsaraDB - July 4, 2022
Qiyang Duan - May 28, 2020
ApsaraDB - November 28, 2022
Alibaba Clouder - June 22, 2018
AnalyticDB for MySQL is a real-time data warehousing service that can process petabytes of data with high concurrency and low latency.
Learn MoreAlibaba Cloud PolarDB for MySQL is a cloud-native relational database service 100% compatible with MySQL.
Learn MoreA real-time data warehouse for serving and analytics which is compatible with PostgreSQL.
Learn MoreHelp media companies build a discovery service for their customers to find the most appropriate content.
Learn MoreMore Posts by ApsaraDB