AnalyticDB for MySQL provides path analysis functions to help you efficiently process path computation jobs, match and count event sequences, and apply the analysis results to the detection and statistics of complex event sequences. This topic describes how to use path analysis functions.
AnalyticDB for MySQL supports the following path analysis functions:
SEQUENCE_MATCH(): checks whether a sequence of events occurs in a specific order.
SEQUENCE_COUNT(): counts the number of occurrences of a specific sequence of events.
Prerequisites
An AnalyticDB for MySQL cluster of V3.2.1.0 or later is created.
For information about how to view the minor version of an AnalyticDB for MySQL cluster, see How do I view the minor version of a cluster? To update the minor version of a cluster, contact technical support.
Test dataset
The following example shows how to prepare a dataset for testing the path analysis functions:
Create an AnalyticDB for MySQL test table.
CREATE TABLE sequence_test( `uid` INT COMMENT 'user id', `ts` DATETIME COMMENT 'date time', `number` INT NULL COMMENT 'number' );
Insert data into the test table.
INSERT INTO sequence_test VALUES (1, '2022-11-02 10:41:00', 1); INSERT INTO sequence_test VALUES (2, '2022-11-02 13:28:02', 2); INSERT INTO sequence_test VALUES (3, '2022-11-02 16:15:01', 1); INSERT INTO sequence_test VALUES (4, '2022-11-02 19:05:04', 2); INSERT INTO sequence_test VALUES (5, '2022-11-02 20:08:44', 3);
SEQUENCE_MATCH()
The SEQUENCE_MATCH() function checks whether a sequence of events occurs in a specific order. This function matches a sequence of events against the specified regular expression and returns a Boolean value. A value of 1 indicates that the sequence of events matches the specified regular expression. A value of 0 indicates that the sequence of events does not match the specified regular expression.
Syntax
SEQUENCE_MATCH(pattern, timestamp, cond1, cond2, ...)
Parameters
Parameter | Description |
pattern | The regular expression that is used to match a specific event occurrence order. Supported formats:
|
timestamp | The timestamp field of the event, which specifies the time when the event occurred. The value of the timestamp field can be of the BIGINT, DATETIME, TIMESTAMP, or DATE type. |
cond | The event conditions against which to match. Each condition is used to match a field of the events. The data type of the return value is BOOLEAN. You can specify up to 32 conditions. Note This function first queries the sequences of events that match the specified conditions and then matches the sequences against the specified regular expression. |
Examples
Query the sequences of events that match the following conditions:
number = 1
,number = 2
, andnumber = 3
. Check whether the queried sequences of events meet the following requirement: An event that matchesnumber = 3
occurred immediately after an event that matchesnumber = 1
.Sample statement:
SELECT SEQUENCE_MATCH('(?1)(?3)', ts, number = 1, number = 2, number = 3) FROM sequence_test;
Sample result:
+-------------------------------------------------------------------+ | sequence_match('(?1)(?3)', ts, number = 1, number = 2, number = 3)| +-------------------------------------------------------------------+ | 0 | +-------------------------------------------------------------------+
Query the sequences of events that match the following conditions:
number = 1
,number = 2
, andnumber = 3
. Check whether the queried sequences of events meet the following requirement: An event that matchesnumber = 3
occurred after an event that matchesnumber = 1
and other events optionally in between.Sample statement:
SELECT SEQUENCE_MATCH('(?1).*(?3)', ts, number = 1, number = 2, number = 3) FROM sequence_test;
Sample result:
+----------------------------------------------------------------------+ | sequence_match('(?1).*(?3)', ts, number = 1, number = 2, number = 3) | +----------------------------------------------------------------------+ | 1 | +----------------------------------------------------------------------+
Query the sequences of events that match the following conditions:
number = 1
andnumber = 2
. Check whether the queried sequences of events meet the following requirement: An event that matchesnumber = 2
occurred after an event that matchesnumber = 1
and the time difference between the two events is greater than or equal to 3,600 milliseconds.Sample statement:
SELECT SEQUENCE_MATCH('(?1)(?t>=3600)(?2)', ts, number = 1, number = 2) FROM sequence_test;
Sample result:
+------------------------------------------------------------------+ | sequence_match('(?1)(?t>=3600)(?2)', ts, number = 1, number = 2) | +------------------------------------------------------------------+ | 1 | +------------------------------------------------------------------+
SEQUENCE_COUNT()
The SEQUENCE_COUNT() function counts the number of occurrences of a specific sequence of events.
Syntax
SEQUENCE_COUNT(pattern, timestamp, cond1, cond2, ...)
Parameters
Parameter | Description |
pattern | The regular expression that is used to match a specific event occurrence order. Supported formats:
|
timestamp | The timestamp field of the event, which specifies the time when the event occurred. The value of the timestamp field can be of the BIGINT, DATETIME, TIMESTAMP, or DATE type. |
cond | The event conditions against which to match. Each condition is used to match a field of the events. The data type of the return value is BOOLEAN. You can specify up to 32 conditions. Note This function first queries the sequences of events that match the specified conditions and then matches the sequences against the specified regular expression. |
Examples
Query the sequences of events that match the following conditions:
number = 1
,number = 2
, andnumber = 3
. Count the number of sequences of events among the queried sequences of events that meet the following requirement: An event that matchesnumber=3
occurred immediately after an event that matchesnumber=1
.Sample statement:
SELECT SEQUENCE_COUNT('(?1)(?3)', ts, number = 1, number = 2, number = 3) FROM sequence_test;
Sample result:
+--------------------------------------------------------------------+ | sequence_count('(?1)(?3)', ts, number = 1, number = 2, number = 3) | +--------------------------------------------------------------------+ | 0 | +--------------------------------------------------------------------+
Query the sequences of events that match the following conditions:
number = 1
,number = 2
, andnumber = 3
. Count the number of sequences of events among the queried sequences of events that meet the following requirement: An event that matchesnumber = 3
occurred after an event that matchesnumber = 1
and other events optionally in between.Sample statement:
SELECT SEQUENCE_COUNT('(?1).*(?3)', ts, number = 1, number = 2, number = 3) FROM sequence_test;
Sample result:
+----------------------------------------------------------------------+ | sequence_count('(?1).*(?3)', ts, number = 1, number = 2, number = 3) | +----------------------------------------------------------------------+ | 1 | +----------------------------------------------------------------------+
Query the sequences of events that match the following conditions:
number = 1
andnumber = 2
. Count the number of sequences of events among the queried sequences of events that meet the following requirement: An event that matchesnumber = 2
occurred after an event that matchesnumber = 1
and the time difference between the two events is greater than or equal to 3,600 milliseconds.Sample statement:
SELECT SEQUENCE_COUNT('(?1)(?t>=3600)(?2)', ts, number = 1, number = 2) FROM sequence_test;
Sample result:
+------------------------------------------------------------------+ | sequence_count('(?1)(?t>=3600)(?2)', ts, number = 1, number = 2) | +------------------------------------------------------------------+ | 2 | +------------------------------------------------------------------+