All Products
Search
Document Center

AnalyticDB:Configure incremental refresh for materialized views (preview)

Last Updated:Sep 26, 2024

Incremental refresh uses a specific algorithm to compute only the changed data, and then updates the data in materialized views. Compared with full refresh, incremental refresh features lower overheads and higher update frequency. If you want to refresh only the changed data, you can configure incremental refresh for materialized views. This topic describes how to configure incremental refresh when you create a materialized view.

Prerequisites

Before you configure incremental refresh for materialized views of an AnalyticDB for MySQL cluster, make sure that the minor version of the cluster meets the following requirements:

  • Incremental refresh for single-table materialized views: V3.1.9.0 or later.

  • Incremental refresh for multi-table materialized views: V3.2.1.0 or later.

Note
  • To query the minor version of an AnalyticDB for MySQL Data Lakehouse Edition cluster, execute the SELECT adb_version(); statement. To update the minor version of a cluster, contact technical support.

  • For information about how to view and update the minor version of an AnalyticDB for MySQL Data Warehouse Edition cluster, see Update the minor version of a cluster.

Scenarios

In most cases, materialized views are used to accelerate complex queries or simplify extract, transform, load (ETL) processes. A materialized view stores pre-computed query results and can be automatically refreshed based on your business requirements. Two methods can be used to refresh data: full refresh and incremental refresh. The following list describes the differences between the two methods:

  • A full refresh re-executes the corresponding SQL statement to overwrite all old data with the updated data.

  • An incremental refresh uses an algorithm to compute only the updated data and update specific data in materialized views.

Compared with full refresh, incremental refresh is more cost-effective in most cases and can be used to implement more frequent data updates. However, incremental refresh is not always better than full refresh. They are suitable for different scenarios.

Full refresh is suitable for the following batch processing scenarios:

  • Data is updated once every day or every hour. In this scenario, full refresh can handle complex SQL statements.

  • A minute-level latency is allowed for queries. In this scenario, full refresh of specific queries can be completed within a dozen seconds.

Incremental refresh is suitable for the following real-time scenarios:

  • Data is updated in real time.

  • Reports or ETL operations are updated in real time.

  • A second-level latency is required.

A materialized view displays query results from a historical point in time. A lower refresh latency provides query results in near real time. Specific data warehouses can update views in real time but at high maintenance costs and reduced write performance of base tables. AnalyticDB for MySQL uses a fully asynchronous algorithm to support all DML operations without compromising the write performance of base tables.

Incremental refresh is more cost-effective than full refresh because only incremental data is computed. If the same amount of data is computed, incremental refresh that uses a more complex algorithm costs more than full refresh. Therefore, incremental refresh is not suitable for batch processing scenarios. Incremental refresh has limits on the types of SQL statements.

Incremental refresh can meet specific stream computing requirements, especially when the incremental refresh results of materialized views are the same as the query results on base tables. By using incremental refresh, you can save the cost for maintaining streaming engine components.

Limits

  • Incremental refresh can be implemented only by using the auto-refresh feature. The auto-refresh interval ranges from 5 seconds to 5 minutes.

  • Materialized views must ensure the same results as the query results on base tables, and must support all DML operations. In this case, incremental refresh is not suitable for all query bodies. If the materialized view that you want to create cannot implement incremental refresh, an error occurs.

  • Partitioned tables cannot be used as base tables of a materialized view that supports incremental refresh.

  • You cannot perform INSERT OVERWRITE and TRUNCATE operations on the base tables of a materialized view that supports incremental refresh. Otherwise, an error occurs.

  • When you configure incremental refresh for multi-table materialized views, take note of the following items:

    • Multi-table materialized views support only INNER JOIN operations.

    • By default, you can join up to five tables to create a materialized view. To create a materialized view that involves more than five tables based on your cluster specifications, submit a ticket.

    • The fields that are used to join the involved tables must be original fields of the tables and must work as indexes. The fields must be of the same data type.

    • You cannot configure incremental refresh for multi-table materialized views that are based on replicated tables.

Required permissions

  • To create a materialized view, you must have the CREATE permission on databases or tables.

  • To refresh a materialized view, you must have the INSERT permission on databases or tables.

  • You must have the SELECT permission on the relevant table columns or all tables that are involved in a materialized view.

  • If you want to configure auto-refresh for a materialized view that you created, you must have permissions to refresh views by using the on-premises server (127.0.0.1) or an IP address ('%').

  • If you want to configure incremental refresh for the materialized view, make sure that you have the ALTER permission on all base tables that are involved in the materialized view.

Preparations

Before you create a materialized view that supports incremental refresh, perform the following operations:

  1. Enable the binary logging feature for the AnalyticDB for MySQL cluster.

    Important
    • If the minor version of the AnalyticDB for MySQL cluster is V3.2.0.0 or later, the binary logging feature is automatically enabled. If the minor version of the AnalyticDB for MySQL cluster is earlier than V3.2.0.0, you must execute the SET statement to enable the binary logging feature.

    • To query the minor version of an AnalyticDB for MySQL Data Lakehouse Edition cluster, execute the SELECT adb_version(); statement. To update the minor version of a cluster, contact technical support.

    • For information about how to view and update the minor version of an AnalyticDB for MySQL Data Warehouse Edition cluster, see Update the minor version of a cluster.

    SET ADB_CONFIG BINLOG_ENABLE=true;
  2. Enable the binary logging feature for base tables.

    ALTER TABLE <table_name> binlog=true;
    Important
    • You cannot perform INSERT OVERWRITE INTO or TRUNCATE operations on tables for which the binary logging feature is enabled.

    • After you create a materialized view that supports incremental refresh, you cannot disable the binary logging feature for base tables.

    • After you delete a materialized view that supports incremental refresh, you can execute the SET ADB_CONFIG BINLOG_ENABLE=false; and ALTER TABLE <table_name> binlog=false; statements to disable the binary logging feature for the AnalyticDB for MySQL cluster and base tables.

Configure incremental refresh when you create a materialized view

CREATE [OR REPLACE] MATERIALIZED VIEW <mv_name>
[MV DEFINITION]
REFRESH FAST [ON DEMAND] [START WITH date] <NEXT date>
AS
<QUERY BODY>

Parameters

Parameter

Required

Description

OR REPLACE

No

The rule that is used to create a materialized view based on whether the name of the materialized view that you want to create is the same as the name of an existing materialized view.

  • If no existing materialized view uses the same name, AnalyticDB for MySQL creates a materialized view.

  • If an existing materialized view uses the same name, AnalyticDB for MySQL deletes the existing materialized view and creates another materialized view.

mv_name

Yes

The name of the materialized view.

[MV DEFINITION]

No

The table-related properties in the materialized view. A materialized view uses a standard table schema to store data. You can use this parameter to specify the primary key, partition key, and index for the materialized view.

A primary key is automatically generated for a materialized view that supports incremental refresh. You can also specify a primary key. Primary keys are generated based on the following rules:

  • If the statement contains a GROUP BY clause, the primary key is the GROUP BY columns. For example, if GROUP BY a,b is specified, the primary key is the a and b columns.

  • If the statement contains aggregate operations but no GROUP BY clauses, the primary key must be set to a constant to ensure global uniqueness.

  • If the statement does not contain aggregate operations or GROUP BY clauses, you must set the primary key to the primary key of the base table.

Note

When you create a materialized view, you cannot specify columns that are not included in the query result. This rule is the same as the rule that is used when you create a table.

REFRESH FAST

Yes

The incremental refresh method.

ON DEMAND

No

The on-demand trigger mode of refreshes. Incremental refresh is automatically triggered based on the specified START WITH date and NEXT date parameters.

START WITH date

No

The first point in time when you want the materialized view to be automatically refreshed. If you do not specify this parameter, the current point in time is used.

NEXT date

Yes

The next point in time when you want the materialized view to be automatically refreshed.

QUERY BODY

Yes

The query body of the materialized view.

  • When you create a materialized view that supports incremental refresh, you cannot specify expressions that may produce indefinite values as conditions in the query body. Examples: NOW() and RAND().

  • The COUNT(), SUM(), MAX(), MIN(), AVG(), APPROX_DISTINCT(), and COUNT(DISTINCT) functions are supported for aggregate operations.

    Note

    Only AnalyticDB for MySQL clusters of V3.2.2.1 or later support the MAX(), MIN(), AVG(), APPROX_DISTINCT(), and COUNT(DISTINCT) functions.

  • If you use the MAX(), MIN(), APPROX_DISTINCT(), or COUNT(DISTINCT) function in the query body of a materialized view that supports incremental refresh, you can perform only INSERT operations on the base tables of the materialized view. You cannot perform operations that may result in data deletion, such as DELETE, UPDATE, REPLACE, and INSERT ON DUPLICATE KEY UPDATE, on the base tables. Otherwise, an error occurs.

  • The COUNT(DISTINCT) function supports only the INTEGER type.

  • The AVG() function does not support the DECIMAL type.

  • Aggregate operations do not support the HAVING keyword.

  • Aggregate operations except COUNT(DISTINCT) do not support the DISTINCT keyword.

Configure incremental refresh for a single-table materialized view

  1. Create a base table.

    CREATE TABLE tbl0
    (a bigint,
     b tinyint,
     c boolean,
     d decimal(15, 2),
     PRIMARY KEY(a))
    DISTRIBUTED BY HASH (a);
  2. Enable the binary logging feature.

    SET ADB_CONFIG BINLOG_ENABLE=true;
    ALTER TABLE tbl0 binlog=true;
  3. Create a materialized view that supports incremental refresh.

    • Create a single-table materialized view that supports incremental refresh at a 10-second interval based on a single table. The materialized view does not involve aggregate operations.

      CREATE MATERIALIZED VIEW mv0
      REFRESH FAST NEXT now() + INTERVAL 10 second
      AS
      SELECT a, b, c   # The system uses the primary key of the base table as the primary key of the materialized view. 
      FROM tbl0
      WHERE d > 1000;
    • Create a single-table materialized view that supports incremental refresh at a 5-second interval based on a single table. The materialized view involves the GROUP BY column and aggregate operations.

      CREATE MATERIALIZED VIEW mv1
      REFRESH FAST NEXT now() + INTERVAL 5 second
      AS
      SELECT
         b, c,                # The system uses the GROUP BY columns as the primary key of the materialized view. 
         COUNT(a) AS cnt_a,   # You can perform an aggregate operation on a column. 
         sum(d) AS sum_d,     # You can perform an aggregate operation on a column. 
         b / 100 AS new_b     # You can use an expression to perform a non-aggregate operation on a column. 
      FROM tbl0
      WHERE ifnull(d, 1) > 0  # You can use expressions in the WHERE clause. 
      GROUP BY b, c;
    • Create a materialized view that supports incremental refresh at a 1-minute interval based on a single table. The materialized view involves an aggregate operation without the GROUP BY column.

      CREATE MATERIALIZED VIEW mv2
      REFRESH FAST NEXT now() + INTERVAL 1 minute
      AS
      SELECT count(*) AS cnt   # The system generates a constant as the primary key to ensure that only one record is contained in the materialized view. 
      FROM tbl0;

Configure incremental refresh for a multi-table materialized view

  1. Create base tables and enable the binary logging feature.

    CREATE TABLE tbl0 (
      a0 bigint,
      b0 varchar,
      c0 int,
      d0 varchar,
      e0 boolean,
      PRIMARY KEY(a0)
    )
    DISTRIBUTED BY HASH (a0) BINLOG=true;
    
    CREATE TABLE tbl1 (
      a1 bigint,
      b1 varchar,
      c1 int,
      d1 varchar,
      e1 boolean,
      PRIMARY KEY(a1)
    )
    DISTRIBUTED BY HASH (a1) BINLOG=true;
    
    CREATE TABLE tbl2 (
      a2 bigint,
      b2 varchar,
      c2 int,
      d2 varchar,
      e2 boolean,
      PRIMARY KEY(a2)
    )
    DISTRIBUTED BY HASH (a2) BINLOG=true;
  2. Create a materialized view that supports incremental refresh.

    • Create a multi-table materialized view that supports incremental refresh at a 5-second interval based on multiple tables. The materialized view does not involve aggregate operations.

      CREATE MATERIALIZED VIEW mv3
      REFRESH FAST NEXT now() + INTERVAL 5 second
      AS
      SELECT a0, a1, a2, (c0 + c1 + c2) AS c
      FROM tbl0
      JOIN tbl1 ON b1 = b0
      JOIN tbl2 ON b2 = b1;
    • Create a multi-table materialized view that supports incremental refresh at a 10-second interval based on multiple tables. The materialized view involves the GROUP BY column and aggregate operations.

      CREATE MATERIALIZED VIEW mv4
      REFRESH FAST NEXT now() + INTERVAL 10 second
      AS
      SELECT 
      d0, d1, d2,                
      COUNT(*) AS cnt,           
      sum(c1) AS sum_c1,         
      sum(c2) AS sum_c2          
      FROM tbl0
      JOIN (SELECT b1, c1, d1 FROM tbl1) ON b1 = b0
      JOIN (SELECT * FROM tbl2 WHERE e2 = true) ON b2 = b1
      GROUP BY d0, d1, d2;