All Products
Search
Document Center

Hologres:User profile analysis

Last Updated:Dec 20, 2024

This topic describes the best practices for tagging and profile analysis in Hologres.

Background

Profile analysis is the process of exploring user interests and analyzing group characteristics based on the natural, behavioral, and preference properties of intended users. User profiling is an important means to depict the comprehensive characteristics of an individual user or a user group. It provides information such as user preferences and behavior for operation analysis personnel to optimize operational strategies. It also provides accurate role information for dedicated product designs. A profiling system typically integrates the user characteristics processing and profile analysis features to provide real-time group analysis and identification after offline processing of characteristics, mapping of tags, and loading of ad hoc analysis data.

Profile analysis has been widely applied in a variety of industries and has become an important means to optimize operational strategies and implement refined operations and precise marketing. The following examples are typical scenarios for which profile analysis is suited.

  • Advertising: Profile analysis provides insights into users to implement targeted advertising.

  • Gaming: Profile analysis provides analysis on churn rates so that operational strategies can be adjusted to increase user viscosity.

  • Education: Profile analysis provides analysis on course quality to improve the retention rate.

However, profile analysis faces challenges in system stability, maintainability, and scalability that are caused by complex data, large amounts of data, and query modes.

  • The O&M personnel must maintain multiple data links for real-time offline processing, which leads to heavy workloads. Traditional online analytical processing (OLAP) engines use an architecture in which storage is coupled with computing. As a result, in scenarios in which computing and storage resources are not proportionate to each other, resource waste occurs and system scaling and migration costs are high.

  • The operations personnel require flexible identification capabilities. To describe a single user, thousands of dimensions may be required, including property and behavior data. Multidimensional OLAP (MOLAP) provides responses within milliseconds but lacks flexibility. Relational OLAP (ROLAP) provides flexibility but takes time to respond and compromises performance.

Hologres solutions

To address the preceding issues, Hologres allows you to determine a solution that offers high performance and scalability by configuring data links, selecting plug-in libraries, and considering the size of your business system.

  • Data links

    Hologres supports real-time offline data processing without the need to maintain multiple data links. This prevents common issues such as data inconsistency and data silos. The following figure shows a data link. 数据链路Hologres provides the following benefits in data integration:

    • Hologres is seamlessly integrated with DataWorks. Complex data dependency issues can be resolved by making access configurations, and stable offline data processing and loading processes are provided.

    • Hologres provides row-oriented storage based on the log-structured merge (LSM) structure for scenarios that involve real-time writes. Hologres is integrated with Realtime Compute for Apache Flink to provide stable performance support for real-time tagging and real-time characteristic processing.

    • Hologres provides the federated query capability and allows access to external data storage services such as MaxCompute, Object Storage Service (OSS), and other Hologres instances by using foreign tables.

  • Profile computing

    Hologres is compatible with the PostgreSQL ecosystem and provides an abundance of built-in functions. In addition, many efficient profile computing plug-ins have been developed on top of the best practices of Alibaba Cloud and its users.

    • Precise deduplication: Roaring bitmap functions

      Hologres supports Roaring bitmaps. It supports union and intersection operations on sets and bitwise aggregate operations by using efficient compressed bitmaps. Roaring bitmaps are suitable for computing tables that contain unique data with multiple dimensions and are typically used in deduplication (UV computing), tag-based filtering, and quasi-real-time user profile analysis. UV is short for unique visitor.

    • Action data-based user identification: Intended user identification functions

      In action data-based user identification scenarios, action data is recorded in a table by day or hour. Users who take specific actions within a specific period of time cannot be directly queried because the action data is scattered across multiple rows. You must perform SELF JOIN operations on action data table multiple times to query such users. For example, you want to query users whose actions are [action='click' and page='Shopping cart'] and [action='view' and page='Favorites'] with the ds value ranging from 20210216 to 20210218.行为明细表

      Hologres provides the bit_construct, bit_or, and bit_match functions to prevent the negative impact of JOIN operations on query performance and simplify SQL operations. These functions are used to filter users. Users whose uid meets specific filter conditions are stored as bit arrays. Then, the bit_match function is used to perform AND operations on the bit arrays. The following code shows an example.

      WITH tbl as (
      SELECT uid, bit_or(bit_construct(
        a := (action='click' and page='Shopping cart'),
        b := (action='view' and page='Favorites'))) as uid_mask
        FROM ods_app_dwd
      WHERE ds < '20210218' AND ds > '20210216'
      GROUP BY uid )
      SELECT uid from tbl where bit_match('a&b', uid_mask);
      • bit_construct: returns values for expressions and stores the values in bit arrays. For example, this function returns [1,0], [0,0], [0,1]... for conditions a and b in the preceding SQL statement.

      • bit_or: performs OR operations on the two bit arrays to query users who meet the filter conditions.

      • bit_match: determines whether a bit array matches an expression. For example, for the a&b expression, this function returns True for [1,1] and False for [1,0].

    • Funnel analysis: Funnel functions

      Funnel analysis is a popular conversion analytics method that is used to understand user behavior and calculate conversion rates. Funnel analysis is widely used for data operations and analysis scenarios such as the analysis of user behavior, application data traffic, and product goal conversion.

      You can use the window funnel function to query events from a sliding time window. This function calculates the maximum number of events that can match the query conditions. Retention analysis is the most common and typical scenario in which user growth is analyzed. In most cases, you can use charts to analyze user retention. The funnel and retention functions can be used to calculate user retention and conversion rates, reduce overheads in complex JOIN operations, and improve performance.

    • Vector processing: Vector processing based on Proxima

      Proxima is a high-performance software library developed by Alibaba DAMO Academy. It allows you to search for the nearest neighbors of vectors. Proxima provides higher stability and performance than similar open source software such as Facebook AI Similarity Search (Fassi). Proxima provides basic modules that have leading performance and effects in the industry and allows you to search for similar images, videos, or human faces. Hologres is deeply integrated with Proxima to provide a high-performance vector search service. K-nearest neighbors (KNN) searches, Radius nearest neighbors (RNN) searches, and DOT_PRODUCT are supported.

  • Solutions

    Different cost and performance requirements are imposed at different development stages of profiling systems. Hologres provides the following solutions based on practical experience and factors such as system data size, implementation cost, and query performance:

    • Wide tables

      This solution is suited for scenarios in which less than 1,000 tags are used and data is infrequently updated. Stable property tables are aggregated into wide tables offline, and JOIN operations on multiple tables are converted into operations on a single wide table. If new tags are required, columns are added to the wide table for these tags. This enables flexible tag-based computing by using tables. For more information, see Wide tables.

    • Roaring bitmaps

      This solution is suited for scenarios in which a large amount of data is involved, a large number of tags are used, and deduplication is required. The structured storage of Roaring bitmaps implements natural deduplications, prevents JOIN overheads, simplifies operations, and accelerates data retrieval. For more information, see Roaring bitmaps.

    • Bit-sliced index (BSI)

      This solution is suitable for association analysis of user attribute tags and user behavior tags, and can significantly optimize the computing performance of high-cardinality behavior tags that involve a large amount of data after deduplication. User attribute tags include the gender and province, and user behavior tags include the page view (PV) and order amount. BSI and Roaring bitmaps are used to convert complex computing operations such as tag deduplication, UNION operations, and JOIN operations into BSI binary operations and Roaring bitmaps. This simplifies computing operations and quickly provides behavior tag analysis results. For more information, see BSI.

  • Summary

    Hologres supports a wide range of profile analysis plug-ins and delivers excellent performance. It is widely used in tag computing and profile analysis scenarios by multiple core businesses within Alibaba Group, such as Alimama, search applications, and AMap, and many public cloud users. The service scalability and stability of Hologres have been tested in production. Hologres has proven itself as the best choice for building a profile analysis platform with high stability and scalability and low development and O&M costs.