All Products
Search
Document Center

AnalyticDB:Path analysis functions

Last Updated:Jul 26, 2024

AnalyticDB for MySQL provides path analysis functions to help you efficiently process path computation jobs, match and count event sequences, and apply the analysis results to the detection and statistics of complex event sequences. This topic describes how to use path analysis functions.

AnalyticDB for MySQL supports the following path analysis functions:

  • SEQUENCE_MATCH(): checks whether a sequence of events occurs in a specific order.

  • SEQUENCE_COUNT(): counts the number of occurrences of a specific sequence of events.

Prerequisites

An AnalyticDB for MySQL cluster of V3.2.1.0 or later is created.

Note

For information about how to view the minor version of an AnalyticDB for MySQL cluster, see How do I view the minor version of a cluster? To update the minor version of a cluster, contact technical support.

Test dataset

The following example shows how to prepare a dataset for testing the path analysis functions:

  1. Create an AnalyticDB for MySQL test table.

    CREATE TABLE sequence_test(
      `uid` INT COMMENT 'user id',
      `ts` DATETIME COMMENT 'date time', 
      `number` INT NULL COMMENT 'number' 
    );
  2. Insert data into the test table.

    INSERT INTO sequence_test  VALUES (1, '2022-11-02 10:41:00', 1);
    INSERT INTO sequence_test  VALUES (2, '2022-11-02 13:28:02', 2); 
    INSERT INTO sequence_test  VALUES (3, '2022-11-02 16:15:01', 1);                                      
    INSERT INTO sequence_test  VALUES (4, '2022-11-02 19:05:04', 2); 
    INSERT INTO sequence_test  VALUES (5, '2022-11-02 20:08:44', 3); 

SEQUENCE_MATCH()

The SEQUENCE_MATCH() function checks whether a sequence of events occurs in a specific order. This function matches a sequence of events against the specified regular expression and returns a Boolean value. A value of 1 indicates that the sequence of events matches the specified regular expression. A value of 0 indicates that the sequence of events does not match the specified regular expression.

Syntax

SEQUENCE_MATCH(pattern, timestamp, cond1, cond2, ...)

Parameters

Parameter

Description

pattern

The regular expression that is used to match a specific event occurrence order.

Supported formats: '(?n)(?n)', '(?n).*(?n)', and '(?n)(?t operator value)(?n)'. Parameters:

  • (?n): the position of the condition for matching in a list of conditions. The value n cannot exceed the number of conditions and cannot be 0.

  • .*: allows any number of other events between two events that match the specified conditions.

  • (?t operator value): specifies the time difference between two events.

    • t: the timestamp field, which is a fixed reserved field.

    • operator: the operator. Valid values: >=, >, <, <=, ==, <>, and !=.

    • value: the time value against which to match, which can be a floating-point number, a positive number, or a negative number. The unit of the time value varies based on data types.

      • Unit for the DATATIME and TIMESTAMP data types: milliseconds.

      • Unit for the DATE data type: days.

      • Unit for the BIGINT data type: varies based on the time unit of the events.

timestamp

The timestamp field of the event, which specifies the time when the event occurred. The value of the timestamp field can be of the BIGINT, DATETIME, TIMESTAMP, or DATE type.

cond

The event conditions against which to match. Each condition is used to match a field of the events.

The data type of the return value is BOOLEAN.

You can specify up to 32 conditions.

Note

This function first queries the sequences of events that match the specified conditions and then matches the sequences against the specified regular expression.

Examples

  • Query the sequences of events that match the following conditions: number = 1, number = 2, and number = 3. Check whether the queried sequences of events meet the following requirement: An event that matches number = 3 occurred immediately after an event that matches number = 1.

    Sample statement:

    SELECT SEQUENCE_MATCH('(?1)(?3)', ts, number = 1, number = 2, number = 3) FROM sequence_test;

    Sample result:

    +-------------------------------------------------------------------+
    | sequence_match('(?1)(?3)', ts, number = 1, number = 2, number = 3)|
    +-------------------------------------------------------------------+
    |                                0                                  |
    +-------------------------------------------------------------------+
  • Query the sequences of events that match the following conditions: number = 1, number = 2, and number = 3. Check whether the queried sequences of events meet the following requirement: An event that matches number = 3 occurred after an event that matches number = 1 and other events optionally in between.

    Sample statement:

    SELECT SEQUENCE_MATCH('(?1).*(?3)', ts, number = 1, number = 2, number = 3) FROM sequence_test;

    Sample result:

    +----------------------------------------------------------------------+
    | sequence_match('(?1).*(?3)', ts, number = 1, number = 2, number = 3) |
    +----------------------------------------------------------------------+
    |                                  1                                   |
    +----------------------------------------------------------------------+
  • Query the sequences of events that match the following conditions: number = 1 and number = 2. Check whether the queried sequences of events meet the following requirement: An event that matches number = 2 occurred after an event that matches number = 1 and the time difference between the two events is greater than or equal to 3,600 milliseconds.

    Sample statement:

    SELECT SEQUENCE_MATCH('(?1)(?t>=3600)(?2)', ts, number = 1, number = 2) FROM sequence_test;

    Sample result:

    +------------------------------------------------------------------+
    | sequence_match('(?1)(?t>=3600)(?2)', ts, number = 1, number = 2) |
    +------------------------------------------------------------------+
    |                                1                                 |
    +------------------------------------------------------------------+

SEQUENCE_COUNT()

The SEQUENCE_COUNT() function counts the number of occurrences of a specific sequence of events.

Syntax

SEQUENCE_COUNT(pattern, timestamp, cond1, cond2, ...)

Parameters

Parameter

Description

pattern

The regular expression that is used to match a specific event occurrence order.

Supported formats: '(?n)(?n)', '(?n).*(?n)', and '(?n)(?t operator value)(?n)'. Parameters:

  • (?n): the position of the condition for matching in a list of conditions. The value n cannot exceed the number of conditions and cannot be 0.

  • .*: allows any number of other events between two events that match the specified conditions.

  • (?t operator value): specifies the time difference between two events.

    • t: the timestamp field, which is a fixed reserved field.

    • operator: the operator. Valid values: >=, >, <, <=, ==, <>, and !=.

    • value: the time value against which to match, which can be a floating-point number, a positive number, or a negative number. The unit of the time value varies based on data types.

      • Unit for the DATATIME and TIMESTAMP data types: milliseconds.

      • Unit for the DATE data type: days.

      • Unit for the BIGINT data type: varies based on the time unit of the events.

timestamp

The timestamp field of the event, which specifies the time when the event occurred. The value of the timestamp field can be of the BIGINT, DATETIME, TIMESTAMP, or DATE type.

cond

The event conditions against which to match. Each condition is used to match a field of the events.

The data type of the return value is BOOLEAN.

You can specify up to 32 conditions.

Note

This function first queries the sequences of events that match the specified conditions and then matches the sequences against the specified regular expression.

Examples

  • Query the sequences of events that match the following conditions: number = 1, number = 2, and number = 3. Count the number of sequences of events among the queried sequences of events that meet the following requirement: An event that matches number=3 occurred immediately after an event that matches number=1.

    Sample statement:

    SELECT SEQUENCE_COUNT('(?1)(?3)', ts, number = 1, number = 2, number = 3) FROM sequence_test;

    Sample result:

    +--------------------------------------------------------------------+
    | sequence_count('(?1)(?3)', ts, number = 1, number = 2, number = 3) |
    +--------------------------------------------------------------------+
    |                                0                                   |
    +--------------------------------------------------------------------+
  • Query the sequences of events that match the following conditions: number = 1, number = 2, and number = 3. Count the number of sequences of events among the queried sequences of events that meet the following requirement: An event that matches number = 3 occurred after an event that matches number = 1 and other events optionally in between.

    Sample statement:

    SELECT SEQUENCE_COUNT('(?1).*(?3)', ts, number = 1, number = 2, number = 3) FROM sequence_test;

    Sample result:

    +----------------------------------------------------------------------+
    | sequence_count('(?1).*(?3)', ts, number = 1, number = 2, number = 3) |
    +----------------------------------------------------------------------+
    |                                  1                                   |
    +----------------------------------------------------------------------+
  • Query the sequences of events that match the following conditions: number = 1 and number = 2. Count the number of sequences of events among the queried sequences of events that meet the following requirement: An event that matches number = 2 occurred after an event that matches number = 1 and the time difference between the two events is greater than or equal to 3,600 milliseconds.

    Sample statement:

    SELECT SEQUENCE_COUNT('(?1)(?t>=3600)(?2)', ts, number = 1, number = 2) FROM sequence_test;

    Sample result:

    +------------------------------------------------------------------+
    | sequence_count('(?1)(?t>=3600)(?2)', ts, number = 1, number = 2) |
    +------------------------------------------------------------------+
    |                              2                                   |
    +------------------------------------------------------------------+