All Products
Search
Document Center

AnalyticDB:Roaring bitmap functions

Last Updated:Aug 20, 2024

Roaring bitmaps are efficiently compressed bitmaps that are widely used in various programming languages and big data platforms for deduplication, tag-based filtering, and computing of time series data. This topic describes how to use roaring bitmap functions.

Supported versions

  • For AnalyticDB for MySQL clusters of V3.1.6.4 or later, the ROARING BITMAP type and roaring bitmap functions are supported in Object Storage Service (OSS) external tables.

  • For AnalyticDB for MySQL clusters of V3.2.1.0 or later, the ROARING BITMAP type and roaring bitmap functions are supported in OSS external tables and AnalyticDB for MySQL internal tables.

Note

For information about how to view the minor version of an AnalyticDB for MySQL cluster, see How do I view the minor version of a cluster? To update the minor version of a cluster, contact technical support.

Limits

When you use roaring bitmap functions in AnalyticDB for MySQL, take note of the following limits:

  • You cannot execute the SELECT statement to query data in a field of the ROARING BITMAP type. You must use the UNNEST operator to query the elements in a field of the ROARING BITMAP type. Sample statement:

    SELECT * FROM unnest(RB_BUILD(ARRAY[1,2,3]));
  • For AnalyticDB for MySQL clusters of versions earlier than V3.2.1.0, the ROARING BITMAP type is supported only in OSS external tables and AnalyticDB for MySQL internal tables that contain columns of the ROARING BITMAP type cannot be created. If you want to use roaring bitmap functions on AnalyticDB for MySQL internal tables, first use the RB_BUILD_VARBINARY function to convert VARBINARY data into the ROARING BITMAP type. Sample statement:

    // Create an internal table.
    CREATE TABLE test_rb_cstore (id INT, rb VARBINARY);
    
    // Invoke roaring bitmap functions.
    SELECT RB_CARDINALITY(RB_BUILD_VARBINARY(rb)) FROM test_rb_cstore;

Functions

Roaring bitmap functions include scalar functions and aggregate functions.

Scalar functions

Function

Input data type

Output data type

Description

Example

RB_BUILD

INT

ROARING BITMAP

Generates a roaring bitmap from an integer array.

RB_BUILD(ARRAY[1,2,3])

RB_BUILD_RANGE

INT,INT

ROARING BITMAP

Generates a roaring bitmap based on a range of integers from the start value (included) to the end value (excluded).

RB_BUILD_RANGE(0, 10000)

RB_BUILD_VARBINARY

VARBINARY

ROARING BITMAP

Generates a roaring bitmap from VARBINARY data.

RB_BUILD_VARBINARY(RB_TO_VARBINARY (RB_BUILD(ARRAY[1,2,3])))

RB_CARDINALITY

ROARING BITMAP

BIGINT

Calculates the cardinality of a roaring bitmap.

RB_CARDINALITY(RB_BUILD(ARRAY[1,2,3]))

RB_CONTAINS

ROARING BITMAP, INT

BOOLEAN

Checks whether a roaring bitmap contains integers.

RB_CONTAINS(RB_BUILD(ARRAY[1,2,3]), 3)

RB_AND

ROARING BITMAP, ROARING BITMAP

ROARING BITMAP

Obtains the intersection of two roaring bitmaps.

RB_AND(RB_BUILD(ARRAY[1,2,3]), RB_BUILD(ARRAY[2,3,4]))

RB_OR

ROARING BITMAP, ROARING BITMAP

ROARING BITMAP

Obtains the union of two roaring bitmaps.

RB_OR(RB_BUILD(ARRAY[1,2,3]),RB_BUILD(ARRAY[2,3,4]))

RB_XOR

ROARING BITMAP, ROARING BITMAP

ROARING BITMAP

Obtains the XOR value of two roaring bitmaps.

RB_XOR(RB_BUILD(ARRAY[1,2,3]),RB_BUILD(ARRAY[2,3,4]))

RB_AND_NULL2EMPTY

ROARING BITMAP, ROARING BITMAP

ROARING BITMAP

Performs an AND operation. If the input value of a parameter is null, the function uses the input value of another parameter as the output. If the input value of a parameter is a pair of empty braces ({}), the function uses a pair of empty braces ({}) as the output.

RB_AND_NULL2EMPTY(RB_BUILD(null),RB_BUILD(ARRAY[3,4,5]))

RB_OR_NULL2EMPTY

ROARING BITMAP, ROARING BITMAP

ROARING BITMAP

Performs an OR operation. If the input value is null, the function uses a pair of empty braces ({}) as the input.

RB_OR_NULL2EMPTY(RB_BUILD(null),RB_BUILD(ARRAY[3,4,5]))

RB_ANDNOT_NULL2EMPTY

ROARING BITMAP, ROARING BITMAP

ROARING BITMAP

Performs an ANDNOT operation. If the input value is null, the function uses a pair of empty braces ({}) as the input.

RB_ANDNOT_NULL2EMPTY(RB_BUILD(null),RB_BUILD(ARRAY[3,4,5]))

RB_AND_CARDINALITY

ROARING BITMAP, ROARING BITMAP

INTEGER

Calculates the cardinality by performing an AND operation on two roaring bitmaps.

RB_AND_CARDINALITY(RB_BUILD(ARRAY[1,2,3]),RB_BUILD(ARRAY[3,4,5]))

RB_AND_NULL2EMPTY_CARDINALITY

ROARING BITMAP, ROARING BITMAP

INTEGER

Calculates the cardinality by performing an AND operation on two roaring bitmaps. If the input value is null, the function uses a pair of empty braces ({}) as the input.

RB_AND_NULL2EMPTY_CARDINALITY(RB_BUILD(null),RB_BUILD(ARRAY[3,4,5]))

RB_OR_CARDINALITY

ROARING BITMAP, ROARING BITMAP

INTEGER

Calculates the cardinality by performing an OR operation on two roaring bitmaps.

RB_OR_CARDINALITY(RB_BUILD(ARRAY[1,2,3]),RB_BUILD(ARRAY[3,4,5]))

RB_OR_NULL2EMPTY_CARDINALITY

ROARING BITMAP, ROARING BITMAP

INTEGER

Calculates the cardinality by performing an OR operation on two roaring bitmaps. If the input value is null, the function uses a pair of empty braces ({}) as the input.

RB_OR_NULL2EMPTY_CARDINALITY(RB_BUILD(null),RB_BUILD(ARRAY[3,4,5]))

RB_XOR_CARDINALITY

ROARING BITMAP, ROARING BITMAP

INTEGER

Calculates the cardinality by performing an XOR operation on two roaring bitmaps.

RB_XOR_CARDINALITY(RB_BUILD(ARRAY[1,2,3]),RB_BUILD(ARRAY[3,4,5]))

RB_ANDNOT_CARDINALITY

ROARING BITMAP, ROARING BITMAP

INTEGER

Calculates the cardinality by performing an ANDNOT operation on two roaring bitmaps.

RB_ANDNOT_CARDINALITY(RB_BUILD(ARRAY[1,2,3]),RB_BUILD(ARRAY[3,4,5]))

RB_ANDNOT_NULL2EMPTY_CARDINALITY

ROARING BITMAP, ROARING BITMAP

INTEGER

Calculates the cardinality by performing an ANDNOT operation on two roaring bitmaps. If the input value is null, the function uses a pair of empty braces ({}) as the input.

RB_ANDNOT_NULL2EMPTY_CARDINALITY(RB_BUILD(ARRAY[1,2,3]),RB_BUILD(ARRAY[3,4,5]))

RB_IS_EMPTY

ROARING BITMAP

BOOLEAN

Checks whether a roaring bitmap is empty.

RB_IS_EMPTY(RB_BUILD(ARRAY[]))

RB_CLEAR

ROARING BITMAP,BIGINT,BIGINT

ROARING BITMAP

Clears the specified range excluding the range end in a roaring bitmap.

RB_CLEAR(RB_BUILD('{1,2,3}'), 2, 3)

RB_CONTAINS

ROARING BITMAP, ROARING BITMAP

BOOLEAN

Checks whether the first roaring bitmap contains the second roaring bitmap.

RB_CONTAINS(RB_BUILD(ARRAY[1,2,3]),RB_BUILD(ARRAY[3]))

RB_FLIP

ROARING BITMAP, INTEGER, INTEGER

ROARING BITMAP

Flips the specified offsets in a roaring bitmap.

RB_FLIP(RB_BUILD(ARRAY[1,2,3,4,5]), 2, 5)

RB_MINIMUM

ROARING BITMAP

INTEGER

Returns the smallest offset in a roaring bitmap. If the roaring bitmap is empty, an error is returned.

RB_MINIMUM(RB_BUILD(ARRAY[1,2,3]))

RB_MAXIMUM

ROARING BITMAP

INTEGER

Returns the largest offset in a roaring bitmap. If the roaring bitmap is empty, an error is returned.

RB_MAXIMUM(RB_BUILD(ARRAY[1,2,3]))

RB_RANK

ROARING BITMAP,INTEGER

INTEGER

Returns the number of elements that are less than or equal to a specific offset in a roaring bitmap.

RB_RANK(RB_BUILD(ARRAY[1,2,3]),2)

RB_TO_ARRAY

ROARING BITMAP

INTEGER

Returns an integer array from which a roaring bitmap is created.

RB_TO_ARRAY(RB_BUILD(ARRAY[1,2,3]))

RB_TO_VARBINAR

ROARING BITMAP

VARBINARY

Returns a VARBINARY array from which a roaring bitmap is created.

RB_TO_VARBINARY(RB_BUILD(ARRAY[1,2,3]))

RB_RANGE_CARDINALITY

ROARING BITMAP, INTEGER, INTEGER

INTEGER

Returns the cardinality of the range from the start position (included) to the end position (not included). The start position is 1.

Important

Only AnalyticDB for MySQL V3.1.10.0 or later supports this function.

RB_RANGE_CARDINALITY(RB_BUILD(ARRAY [1,2,3]),2,3)

RB_SELECT

ROARING BITMAP, INTEGER, INTEGER

ROARING BITMAP

Returns the offsets of a roaring bitmap in the range from the start position (included) to the end position (not included).

Important

Only AnalyticDB for MySQL V3.1.10.0 or later supports this function.

RB_SELECT(RB_BUILD(ARRAY [1,3,4,5,7,9]),2, 3)

Aggregate functions

Function

Input data type

Output data type

Description

Example

RB_BUILD_AGG

INTEGER

ROARING BITMAP

Creates a roaring bitmap from a group of offsets.

RB_CARDINALITY(RB_BUILD_AGG(1))

RB_OR_AGG

ROARING BITMAP

ROARING BITMAP

Performs an OR aggregate operation.

RB_CARDINALITY(RB_OR_AGG(RB_BUILD(array[1,2,3])))

RB_AND_AGG

ROARING BITMAP

ROARING BITMAP

Performs an AND aggregate operation.

RB_CARDINALITY(RB_AND_AGG(RB_BUILD(ARRAY[1,2,3])))

RB_XOR_AGG

ROARING BITMAP

ROARING BITMAP

Performs an XOR aggregate operation.

RB_CARDINALITY(RB_XOR_AGG(RB_BUILD(ARRAY[1,2,3])))

RB_OR_CARDINALITY_AGG

ROARING BITMAP

INTEGER

Calculates the cardinality by performing an OR aggregate operation on two roaring bitmaps.

RB_OR_CARDINALITY_AGG(RB_BUILD(ARRAY[1,2,3]))

RB_AND_CARDINALITY_AGG

ROARING BITMAP

INTEGER

Calculates the cardinality by performing an AND aggregate operation on two roaring bitmaps.

RB_AND_CARDINALITY_AGG(RB_BUILD(ARRAY[1,2,3]))

RB_XOR_CARDINALITY_AGG

ROARING BITMAP

INTEGER

Calculates the cardinality by performing an XOR aggregate operation on two roaring bitmaps.

RB_XOR_CARDINALITY_AGG(RB_BUILD(ARRAY[1,2,3]))

Examples

The following examples describe how to use roaring bitmap functions in an internal table and an external table.

Internal table

  1. Create an internal table that supports roaring bitmaps.

    CREATE TABLE `test_rb` (
      `id` INT,
      `rb` ROARINGBITMAP
    );
  2. Write data to the table.

    INSERT INTO test_rb VALUES (1, '[1, 2, 3]');
    INSERT INTO test_rb VALUES (2, '[2, 3, 4, 5, 6]');
  3. Obtain the cardinality of the roaring bitmaps.

    SELECT id, RB_CARDINALITY(rb) FROM test_rb;

    Sample result:

    +------+--------------------+
    | id   | rb_cardinality(rb) |
    +------+--------------------+
    |    2 |                  5 |
    |    1 |                  3 |
    +------+--------------------+
  4. Perform aggregation on the roaring bitmaps.

    SELECT RB_OR_CARDINALITY_AGG(rb) FROM test_rb;

    Sample result:

    +---------------------------+
    | rb_or_cardinality_agg(rb) |
    +---------------------------+
    |                         6 |
    +---------------------------+

External table

  1. Create an external table that supports roaring bitmaps.

    CREATE TABLE `test_rb` (
      `id` INT,
      `rb` ROARINGBITMAP
      ) engine = 'oss'
    TABLE_PROPERTIES = '{
    "endpoint":"oss-cn-zhangjiakou.aliyuncs.com",
    "accessid":"LTAIF****5FsE"",
    "accesskey":"Ccw****iWjv",
    "url":"oss://testBucketName/roaringbitmap/test_for_user/",
    "format":"parquet"
    }';
    Note

    For information about the parameters of an external table, see the "Create an OSS external table for a non-partitioned object" section of the Use external tables to import data to Data Warehouse Edition topic.

  2. Write data to the table.

    Important

    Writing data by using INSERT INTO statements is not a performant method. If you want to write a large amount of data, we recommend that you use an extract, transform, load (ETL) tool to generate data files in the Parquet format and upload them to an OSS directory.

    INSERT INTO test_rb SELECT 1, rb_build(ARRAY[1,2,3]);
    INSERT INTO test_rb SELECT 2, rb_build(ARRAY[2,3,4,5]);
  3. Obtain the cardinality of the roaring bitmaps.

    SELECT id, RB_CARDINALITY(rb) FROM test_rb;

    Sample result:

    +------+--------------------+
    | id   | rb_cardinality(rb) |
    +------+--------------------+
    |    2 |                  4 |
    |    1 |                  3 |
    +------+--------------------+
  4. Perform aggregation on the roaring bitmaps.

    SELECT RB_OR_CARDINALITY_AGG(rb) FROM test_rb;

    Sample result:

    +---------------------------+
    | rb_or_cardinality_agg(rb) |
    +---------------------------+
    |                         5 |
    +---------------------------+

Example: Identification of target customers

In this example, a tag table is converted into a roaring bitmap table, and analysis is performed based on the roaring bitmaps, as shown in the following figure. 1

Step 1: Prepare a tag table

  1. Create a tag table named users_base for testing.

    CREATE TABLE users_base(
       uid INT,
       tag1 STRING, // Valid values of tag1: x, y, and z 
       tag2 STRING, // Valid values of tag2: a and b 
       tag3 INT // Valid values of tag3: 1 to 10 
    );
  2. Insert data to generate 100 million rows of tag data.

    SUBMIT JOB
    INSERT OVERWRITE users_base
    SELECT CAST(ROW_NUMBER() OVER (ORDER BY c1) AS INT) AS uid, SUBSTRING('xyz', FLOOR(RAND() * 3) + 1, 1) AS tag1, SUBSTRING('ab', FLOOR(RAND() * 2) + 1, 1) AS tag2, CAST(FLOOR(RAND() * 10) + 1 AS INT) as tag3 FROM  
    (
    SELECT A.c1 FROM
    UNNEST(RB_BUILD_RANGE(0, 10000)) AS A(c1)
      JOIN
      (SELECT c1 FROM
    UNNEST(RB_BUILD_RANGE(0, 10000)) AS B(c1)
    ));
  3. Query 10 rows of data from the users_base table.

    SELECT * FROM users_base LIMIT 10;

    Sample result:

    +--------+------+------+------+
    | uid    | tag1 | tag2 | tag3 |
    +--------+------+------+------+
    |  74526 | y    | b    |    3 |
    |  75611 | z    | b    |   10 |
    |  80850 | x    | b    |    5 |
    |  81656 | z    | b    |    7 |
    | 163845 | x    | b    |    2 |
    | 167007 | y    | b    |    4 |
    | 170541 | y    | b    |    9 |
    | 213108 | x    | a    |   10 |
    |  66056 | y    | b    |    4 |
    |  67761 | z    | a    |    2 |
    +--------+------+------+------+

Step 2: Group the tag table

To utilize the concurrency capability of the distributed computing engines of AnalyticDB for MySQL, we recommend that you add columns that can group the data by uid. In this example, a column named user-group is created for grouping. The group sizes vary based on the total number of AnalyticDB compute units (ACUs) in your cluster.

  • A large number of groups bring high computing efficiency. However, if each roaring bitmap contains only a small number of elements, the system cannot take full advantage of roaring bitmaps.

  • In actual scenarios, we recommend that you limit the number of records within each group to less than 100 million. For example, if the total number of uid values is 10 billion, you can create 100 groups that each have 100 million rows of data.

In this example, uid values fall into 16 groups. The user_group column holds the remainders of uid values divided by 16. The offset column holds the quotient of the division. The offset column is used for the computation of roaring bitmaps. A uid can be calculated based on the following formula: uid = 16 × offset + user_group.

This grouping method is for reference only. You need to design your grouping based on your business requirements.

  1. Create a tag table named users that contains the column for grouping.

    CREATE TABLE users(
       uid INT,
       tag1 STRING,
       tag2 STRING,
       tag3 INT,
       user_group INT, // The grouping field.
       offset INT // The offset field.
    );
  2. Write the data of the users_base table to the users table.

    SUBMIT JOB INSERT OVERWRITE users SELECT uid, tag1, tag2, tag3, CAST(uid%16 AS INT), CAST(FLOOR(uid/16) AS INT) FROM users_base;
  3. Query 10 rows of data from the users table.

    SELECT * FROM users LIMIT 10;

    Sample result:

    +---------+------+------+------+------------+--------+
    | uid     | tag1 | tag2 | tag3 | user_group | offset |
    +---------+------+------+------+------------+--------+
    |  377194 | z    | b    |   10 |         10 |  23574 |
    |  309440 | x    | a    |    1 |          0 |  19340 |
    |  601745 | z    | a    |    7 |          1 |  37609 |
    |  753751 | z    | b    |    3 |          7 |  47109 |
    |  988186 | y    | a    |   10 |         10 |  61761 |
    |  883822 | x    | a    |    9 |         14 |  55238 |
    |  325065 | x    | b    |    6 |          9 |  20316 |
    | 1042875 | z    | a    |   10 |         11 |  65179 |
    |  928606 | y    | b    |    5 |         14 |  58037 |
    |  990858 | z    | a    |    8 |         10 |  61928 |
    +---------+------+------+------+------------+--------+

Step 3: Build roaring bitmaps

Internal table

  1. Create a roaring bitmap table named tag_tbl_1 for tag 1.

    CREATE TABLE `tag_tbl_1` (
      `tag1` STRING,
      `rb` ROARINGBITMAP,
      `user_group` INT
    );
  2. Write the data of the users table to the tag_tbl_1 table.

    INSERT OVERWRITE tag_tbl_1 SELECT tag1, RB_BUILD_AGG(offset), user_group FROM users GROUP BY tag1, user_group;
  3. Query the data of tag_tbl_1.

    SELECT tag1, user_group, RB_CARDINALITY(rb) FROM tag_tbl_1;

    Sample result:

    +------+------------+--------------------+
    | tag1 | user_group | rb_cardinality(rb) |
    +------+------------+--------------------+
    | y    |         13 |             563654 |
    | x    |         11 |             565013 |
    | z    |          2 |             564428 |
    | x    |          4 |             564377 |
    ...                                 
    | z    |          5 |             564333 |
    | x    |          8 |             564808 |
    | x    |          0 |             564228 |
    | y    |          3 |             563325 |
    +------+------------+--------------------+
  4. Create a roaring bitmap table named tag_tbl_2 for tag 2.

    CREATE TABLE `tag_tbl_2` (
      `tag2` STRING,
      `rb` ROARINGBITMAP,
      `user_group` INT
    );
  5. Write the data of the users table to the tag_tbl_2 table.

    INSERT OVERWRITE tag_tbl_2 SELECT tag2, RB_BUILD_AGG(offset), user_group FROM users GROUP BY tag2, user_group;
  6. Query the data of tag_tbl_2.

    SELECT tag2, user_group, RB_CARDINALITY(rb) FROM tag_tbl_2;

    Sample result:

    +------+------------+--------------------+
    | tag2 | user_group | rb_cardinality(rb) |
    +------+------------+--------------------+
    | a    |          9 |            3123039 |
    | a    |          5 |            3123973 |
    | a    |         12 |            3122414 |
    | a    |          7 |            3127218 |
    | a    |         15 |            3125403 |
    ...                                  
    | a    |         10 |            3122698 |
    | b    |          4 |            3126091 |
    | b    |          3 |            3124626 |
    | b    |          9 |            3126961 |
    | b    |         14 |            3125351 |
    +------+------------+--------------------+

External table

  1. Create a roaring bitmap table named tag_tbl_1 for tag 1.

    CREATE TABLE `tag_tbl_1` (
      `tag1` STRING,
      `rb` ROARINGBITMAP,
      `user_group` INT
     ) engine = 'oss'
    TABLE_PROPERTIES = '{
    "endpoint":"oss-cn-zhangjiakou.aliyuncs.com",
    "accessid":"LTAIF****5FsE"",
    "accesskey":"Ccw****iWjv",
    "url":"oss://testBucketName/roaringbitmap/tag_tbl_1/",
    "format":"parquet"
    }';
  2. Write the data of the users table to the tag_tbl_1 table.

    INSERT OVERWRITE tag_tbl_1 SELECT tag1, RB_BUILD_AGG(offset), user_group FROM users GROUP BY tag1, user_group;
  3. Query the data of tag_tbl_1.

    SELECT tag1, user_group, RB_CARDINALITY(rb) FROM tag_tbl_1;

    Sample result:

    +------+------------+--------------------+
    | tag1 | user_group | rb_cardinality(rb) |
    +------+------------+--------------------+
    | z    |          7 |            2082608 |
    | x    |         10 |            2082953 |
    | y    |          7 |            2084730 |
    | x    |         14 |            2084856 |
    ...                                  
    | z    |         15 |            2084535 |
    | z    |          5 |            2083204 |
    | x    |         11 |            2085239 |
    | z    |          1 |            2084879 |
    +------+------------+--------------------+
  4. Create a roaring bitmap table named tag_tbl_2 for tag 2.

    CREATE TABLE `tag_tbl_2` (
      `tag2` STRING,
      `rb` ROARINGBITMAP,
      `user_group` INT
     ) engine = 'oss'
    TABLE_PROPERTIES = '{
    "endpoint":"oss-cn-zhangjiakou.aliyuncs.com",
    "accessid":"LTAIF****5FsE"",
    "accesskey":"Ccw****iWjv",
    "url":"oss://testBucketName/roaringbitmap/tag_tbl_2/",
    "format":"parquet"
    }';
  5. Write the data of the users table to the tag_tbl_2 table.

    INSERT OVERWRITE tag_tbl_2 SELECT tag2, RB_BUILD_AGG(offset), user_group FROM users GROUP BY tag2, user_group;
  6. Query the data of tag_tbl_2.

    SELECT tag2, user_group, RB_CARDINALITY(rb) FROM tag_tbl_2;

    Sample result:

    +------+------------+--------------------+
    | tag2 | user_group | rb_cardinality(rb) |
    +------+------------+--------------------+
    | b    |         11 |            3121361 |
    | a    |          6 |            3124750 |
    | a    |          1 |            3125433 |
    ...                                    
    | b    |          2 |            3126523 |
    | b    |         12 |            3123452 |
    | a    |          4 |            3126111 |
    | a    |         13 |            3123316 |
    | a    |          2 |            3123477 |
    +------+------------+--------------------+

Step 4: Perform analysis based on the roaring bitmaps

Scenario 1: Filtering analysis

In this example, the number of users for each value of tag2 that meet tag1 in (x, y) is obtained.

  1. Query the users that meet tag1 in (x, y).

    SELECT tag2, t1.user_group AS user_group, RB_CARDINALITY(RB_AND(t2.rb, rb1)) AS rb FROM tag_tbl_2 AS t2
    JOIN (
    SELECT user_group, rb AS rb1 FROM tag_tbl_1 WHERE tag1 IN ('x', 'y'))
    AS t1
    ON t1.user_group = t2.user_group;

    Sample result:

    +------+------------+---------+
    | tag2 | user_group | rb      |
    +------+------------+---------+
    | b    |          3 | 1041828 |
    | a    |          15| 1039859 |
    | a    |          9 | 1039140 |
    | b    |          1 | 1041524 |
    | a    |          4 | 1041599 |
    | b    |          1 | 1041381 |
    | b    |          10| 1041026 |
    | b    |          6 | 1042289 |
    +------+------------+---------+
  2. Obtain the number of users for each value of tag2 that meet tag1 in (x, y).

    SELECT tag2, SUM(cnt) FROM ( 
    SELECT tag2, t1.user_group AS user_group, RB_CARDINALITY(RB_AND(t2.rb, rb1)) AS cnt FROM tag_tbl_2 AS t2
    JOIN (
    SELECT user_group, rb AS rb1 FROM tag_tbl_1 WHERE tag1 IN ('x', 'y'))
    AS t1
    ON t1.user_group = t2.user_group
    ) GROUP BY tag2;

    Sample result:

    +------+----------+
    | tag2 | sum(cnt) |
    +------+----------+
    | a    | 33327868 |
    | b    | 33335220 |
    +------+----------+

Scenario 2: Calculate the intersection, union, and XOR of roaring bitmaps

Retrieve data from the tag_tbl_1 table that meets tag1 = 'x' or tag1 = 'y' and data from the tag_tbl_2 table that meets tag2 = 'b', and obtain the data that meets both conditions.

SELECT user_group, RB_CARDINALITY(rb) FROM (
    SELECT
      t1.user_group AS user_group,
      RB_AND(rb1, rb2) AS rb
    FROM
      (
        SELECT
          user_group,
          RB_OR_AGG(rb) AS rb1
        FROM
          tag_tbl_1
        WHERE
          tag1 = 'x'
          OR tag1 = 'y'
        GROUP BY
          user_group
      ) AS t1
      JOIN (
        SELECT
          user_group,
          RB_OR_AGG(rb) AS rb2
        FROM
          tag_tbl_2
        WHERE
          tag2 = 'b'
        GROUP BY
          user_group
      ) AS t2 ON t1.user_group = t2.user_group
  GROUP BY user_group);

Sample result:

+------------+--------------------+
| user_group | rb_cardinality(rb) |
+------------+--------------------+
|         10 |            2083679 |
|          3 |            2082370 |
|          9 |            2082847 |
|          2 |            2086511 |
...                              
|          1 |            2082291 |
|          4 |            2083290 |
|         14 |            2083581 |
|         15 |            2084110 |
+------------+--------------------+

Scenario 3: Calculate the intersection, union, and XOR of roaring bitmaps and original tag tables

Retrieve data from the tag_tbl_1 table that meets tag1 = 'x' or tag1 = 'y' and data from the users table that meets tag2 = 'b', and obtain the data that meets both conditions.

SELECT user_group, RB_CARDINALITY(rb) FROM (
    SELECT
      t1.user_group AS user_group,
      RB_AND(rb1, rb2) AS rb
    FROM
      (
        SELECT
          user_group,
          RB_OR_AGG(rb) AS rb1
        FROM
          tag_tbl_1
        WHERE
          tag1 = 'x'
          OR tag1 = 'y'
        GROUP BY
          user_group
      ) AS t1
      JOIN (
        SELECT
          user_group,
          RB_BUILD_AGG(offset) AS rb2
        FROM
          users
        WHERE
          tag2 = 'b'
        GROUP BY
          user_group
      ) AS t2 ON t1.user_group = t2.user_group
  GROUP BY user_group);

Sample result:

+------------+--------------------+
| user_group | rb_cardinality(rb) |
+------------+--------------------+
|          3 |            2082370 |
|          1 |            2082291 |
|          0 |            2082383 |
|          4 |            2083290 |
|         11 |            2081662 |
|         13 |            2085280 |
...                              
|         14 |            2083581 |
|         15 |            2084110 |
|          9 |            2082847 |
|          8 |            2084860 |
|          5 |            2083056 |
|          7 |            2083275 |
+------------+--------------------+

Scenario 4: Export results to OSS by using an OSS external table

  1. Create a tag table named tag_tbl_3 to which the calculation results are exported.

    CREATE TABLE `tag_tbl_3` (
      `user_group` INT,
      `rb` ROARINGBITMAP
      )engine = 'oss'
    TABLE_PROPERTIES = '{
    "endpoint":"oss-cn-zhangjiakou.aliyuncs.com",
    "accessid":"LTAIF****5FsE"",
    "accesskey":"Ccw****iWjv",
    "url":"oss://testBucketName/roaringbitmap/tag_tbl_3/",
    "format":"parquet"
    }';
  2. Export the results in Scenario 2 to tag_tbl_3.

    INSERT OVERWRITE tag_tbl_3
        SELECT
          t1.user_group AS user_group,
          RB_AND(rb1, rb2) AS rb
        FROM
          (
            SELECT
              user_group,
              RB_OR_AGG(rb) AS rb1
            FROM
              tag_tbl_1
            WHERE
              tag1 = 'x'
              OR tag1 = 'y'
            GROUP BY
              user_group
          ) AS t1
          JOIN (
            SELECT
              user_group,
              RB_OR_AGG(rb) AS rb2
            FROM
              tag_tbl_2
            WHERE
              tag2 = 'b'
            GROUP BY
              user_group
          ) AS t2 ON t1.user_group = t2.user_group;
    Note

    After the statement is executed, the file is stored as a Parquet file in the oss://testBucketName/roaringbitmap/tag_tbl_3/ directory.

Scenario 5: Accelerate queries on an external table

You can import data from the tag_tbl_1 table to an internal table for better query performance.

  1. Create an internal table named tag_tbl_1_cstore and set the data type of the rb field to VARBINARY.

    CREATE TABLE `tag_tbl_1_cstore` (
     `tag1` VARCHAR,
     `rb` VARBINARY,
     `user_group` INT
    );
  2. Import data of the tag_tbl_1 table from OSS to tag_tbl_1_cstore.

    INSERT INTO tag_tbl_1_cstore SELECT tag1, RB_TO_VARBINARY(rb), user_group FROM tag_tbl_1;
  3. Query the data of the tag_tbl_1_cstore table.

    SELECT tag1, user_group, RB_CARDINALITY(RB_OR_AGG(RB_BUILD_VARBINARY(rb))) FROM tag_tbl_1_cstore GROUP BY tag1, user_group;

    Sample result:

    +------+------------+---------------------------------------------------+
    | tag1 | user_group | rb_cardinality(rb_or_agg(rb_build_varbinary(rb))) |
    +------+------------+---------------------------------------------------+
    | y    |          3 |                                           2082919 |
    | x    |          9 |                                           2083085 |
    | x    |          3 |                                           2082140 |
    | y    |         11 |                                           2082268 |
    | z    |          4 |                                           2082451 |
    ...                                                                    
    | z    |          2 |                                           2081560 |
    | y    |          6 |                                           2082194 |
    | z    |          7 |                                           2082608 |
    +------+------------+---------------------------------------------------+