The funnel analysis feature of Data Lake Analytics (DLA) allows Q&M personnel to analyze the conversion rates at each step in a multi-step operation process. Assume that the purchase of a product triggers a series of events, including startup, logon, product search, product browsing, and order generation. O&M personnel analyze the conversion rates for these events in sequence within a period of time (for example, from January 5, 2017 to February 5, 2017). During the analysis, they calculate the following numbers: A, which refers to the total number of persons that trigger logon events among all users. B, which refers to the total number of persons that trigger product search events among A. C, which refers to the total number of persons that trigger product browsing events among B. D, which refers to the total number of persons that trigger order generation events among C.
Time window is a term in funnel analysis. All events are triggered within the same time window. Assume that the time window is one day. The product search events triggered by User 001 are valid only when they occur on the same day as logon events triggered by User 001. Similarly, the product browsing events triggered by User 001 are valid only when they occur on the same day as logon events triggered by User 001. The time window can be set to any time range, such as one day, three days, seven days, one hour, or six hours.
Prerequisites
The test data in this example is stored in Object Storage Service (OSS). You can perform the following steps to store your test data in OSS:
- Activate OSS.
- Create directories.
- Upload objects.
In this example, the following test data is uploaded to the
funnel_data
folder:1000002 1483258815538 Add products to favorites {} 20170101 1000002 1483274981790 Start up {} 20170101 1000002 1483223087508 Search for products {"content": "computer", "page_num": 1} 20170101 1000002 1483232016805 Search for products {"content": "Apple", "page_num": 2} 20170101 1000002 1483200895552 Complete orders {} 20170101 1000003 1483218002826 Search for products {"content": "computer", "page_num": 2} 20170101 1000003 1483206471681 Add products to the shopping cart {} 20170101 1000003 1483284553531 Add products to the shopping cart {} 20170101 1000003 1483279891663 Browse products {"brand": "Apple", "price": 9500} 20170101 1000003 1483259182702 Add products to the shopping cart {} 20170101 1000004 1483260505099 Log on {} 20170101 1000004 1483231995064 Add products to favorites {} 20170101 1000004 1483206318588 Add products to the shopping cart {} 20170101 1000004 1483241973408 Log on {} 20170101 1000004 1483202785557 Add products to favorites {} 20170101 1000005 1483214265416 Add products to favorites {} 20170101 1000005 1483206343383 Add products to favorites {} 20170101 1000005 1483229265488 Browse products {"brand": "HuaW", "price": 9500} 20170101 1000005 1483244294552 Generate orders {"price_all": 5000} 20170101 1000005 1483246988534 Log on {} 20170101 1000006 1483207838307 Add products to favorites {} 20170101
Before you use funnel analysis, make sure that the following prerequisites are met:
- DLA is activated. For more information, see Activate DLA.
- Reset the database password
Step 1: Create an OSS schema
CREATE Schema funnel_test_schema
WITH DBPROPERTIES (
catalog = 'oss',
location = 'oss://bucket-name/'
)
Step 2. Create a table
CREATE EXTERNAL TABLE IF NOT EXISTS funnel_test (
user_id bigint NOT NULL COMMENT '',
event_time bigint NOT NULL COMMENT '',
event_id int NOT NULL COMMENT '',
event_name varchar NOT NULL COMMENT '',
event_attr varchar NOT NULL COMMENT '',
event_date date NOT NULL COMMENT ''
)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'
STORED AS TEXTFILE
LOCATION 'oss://bucket-name/funnel_data/';
Step 3: Use a funnel analysis function to analyze data
The event IDs for the startup, logon, product search, product browsing, and order generation events are 10001, 10002, 10003, 10004, and 10007 in sequence.
-
In the following example, the
funnel_count
function is used to query the funnels that contain five events. These events are triggered from January 1 to January 20 with a time window of seven days. The unit of the time window is milliseconds. The function can also be used to save the data of the users involved in each event.SELECT user_id, funnel_count(event_time,7 * 86400000,event_id, '10001,10002,10003,10004,10007') AS xwho_state FROM funnel_test WHERE event_id IN (10001, 10002, 10003, 10004, 10007) AND event_date BETWEEN '2017-01-01' AND '2019-01-20' GROUP BY user_id;
-
In the following example, the
funnel_sum
function is used based on thefunnel_count
function. This allows you to obtain the overall conversion rate of the events.SELECT funnel_sum(xwho_state, 5) FROM ( SELECT user_id,funnel_count(event_time,7 * 86400000, event_id, '10001,10002,10003,10004,10007') AS xwho_state FROM funnel_test WHERE event_id IN (10001, 10002, 10003, 10004, 10007) AND event_date BETWEEN '2017-01-01' AND '2019-01-20' GROUP BY user_id );
-
If you want to filter data in the JSON attribute column
event_attr
, for example, filtering the prices that range from 3,500 to 5,000 from events whose IDs are 10004, you can execute the following SQL statement:SELECT funnel_sum(xwho_state, 5) FROM ( SELECT user_id,funnel_count(event_time, 7 * 86400000,event_id, '10001,10002,10003,10004,10007') AS xwho_state FROM funnel_test WHERE event_id IN (10001, 10002, 10003, 10004, 10007) AND event_date BETWEEN '2017-01-01' AND '2019-01-20' OR (event_id = 10004 AND json_extract_scalar(event_attr, '$.price') BETWEEN 3500 AND 5000) GROUP BY user_id );
The preceding paths and events are represented by IDs. In actual scenarios, you may
have only event names. DLA allows event names to represent the path events. In this
example, the event names are listed in the event_name
column.
SELECT funnel_sum(xwho_state, 7) AS funnel
FROM (SELECT user_id,funnel_count(event_time, 7 * 86400000,event_name,'startup,logon,product search, product browsing,order generation,order completion,product evaluation') AS xwho_state FROM funnel_test
WHERE event_name IN
('startup', 'logon', 'product search', 'product browsing', 'order generation', 'order completion', 'product evaluation')
GROUP BY user_id);