Incremental refresh uses a specific algorithm to compute only the changed data, and then updates the data in materialized views. Compared with full refresh, incremental refresh features lower overheads and higher update frequency. If you want to refresh only the changed data, you can configure incremental refresh for materialized views. This topic describes how to configure incremental refresh when you create a materialized view.
Prerequisites
Before you configure incremental refresh for materialized views of an AnalyticDB for MySQL cluster, make sure that the minor version of the cluster meets the following requirements:
Incremental refresh for single-table materialized views: V3.1.9.0 or later.
Incremental refresh for multi-table materialized views: V3.2.1.0 or later.
To query the minor version of an AnalyticDB for MySQL Data Lakehouse Edition cluster, execute the
SELECT adb_version();
statement. To update the minor version of a cluster, contact technical support.For information about how to view and update the minor version of an AnalyticDB for MySQL Data Warehouse Edition cluster, see Update the minor version of a cluster.
Scenarios
In most cases, materialized views are used to accelerate complex queries or simplify extract, transform, load (ETL) processes. A materialized view stores pre-computed query results and can be automatically refreshed based on your business requirements. Two methods can be used to refresh data: full refresh and incremental refresh. The following list describes the differences between the two methods:
A full refresh re-executes the corresponding SQL statement to overwrite all old data with the updated data.
An incremental refresh uses an algorithm to compute only the updated data and update specific data in materialized views.
Compared with full refresh, incremental refresh is more cost-effective in most cases and can be used to implement more frequent data updates. However, incremental refresh is not always better than full refresh. They are suitable for different scenarios.
Full refresh is suitable for the following batch processing scenarios:
Data is updated once every day or every hour. In this scenario, full refresh can handle complex SQL statements.
A minute-level latency is allowed for queries. In this scenario, full refresh of specific queries can be completed within a dozen seconds.
Incremental refresh is suitable for the following real-time scenarios:
Data is updated in real time.
Reports or ETL operations are updated in real time.
A second-level latency is required.
A materialized view displays query results from a historical point in time. A lower refresh latency provides query results in near real time. Specific data warehouses can update views in real time but at high maintenance costs and reduced write performance of base tables. AnalyticDB for MySQL uses a fully asynchronous algorithm to support all DML operations without compromising the write performance of base tables.
Incremental refresh is more cost-effective than full refresh because only incremental data is computed. If the same amount of data is computed, incremental refresh that uses a more complex algorithm costs more than full refresh. Therefore, incremental refresh is not suitable for batch processing scenarios. Incremental refresh has limits on the types of SQL statements.
Incremental refresh can meet specific stream computing requirements, especially when the incremental refresh results of materialized views are the same as the query results on base tables. By using incremental refresh, you can save the cost for maintaining streaming engine components.
Limits
Incremental refresh can be implemented only by using the auto-refresh feature. The auto-refresh interval ranges from 5 seconds to 5 minutes.
Materialized views must ensure the same results as the query results on base tables, and must support all DML operations. In this case, incremental refresh is not suitable for all query bodies. If the materialized view that you want to create cannot implement incremental refresh, an error occurs.
Partitioned tables cannot be used as base tables of a materialized view that supports incremental refresh.
You cannot perform INSERT OVERWRITE and TRUNCATE operations on the base tables of a materialized view that supports incremental refresh. Otherwise, an error occurs.
When you configure incremental refresh for multi-table materialized views, take note of the following items:
Multi-table materialized views support only INNER JOIN operations.
By default, you can join up to five tables to create a materialized view. To create a materialized view that involves more than five tables based on your cluster specifications, submit a ticket.
The fields that are used to join the involved tables must be original fields of the tables and must work as indexes. The fields must be of the same data type.
You cannot configure incremental refresh for multi-table materialized views that are based on replicated tables.
Required permissions
To create a materialized view, you must have the CREATE permission on databases or tables.
To refresh a materialized view, you must have the INSERT permission on databases or tables.
You must have the SELECT permission on the relevant table columns or all tables that are involved in a materialized view.
If you want to configure auto-refresh for a materialized view that you created, you must have permissions to refresh views by using the on-premises server (
127.0.0.1
) or an IP address ('%'
).If you want to configure incremental refresh for the materialized view, make sure that you have the ALTER permission on all base tables that are involved in the materialized view.
Preparations
Before you create a materialized view that supports incremental refresh, perform the following operations:
Enable the binary logging feature for the AnalyticDB for MySQL cluster.
ImportantIf the minor version of the AnalyticDB for MySQL cluster is V3.2.0.0 or later, the binary logging feature is automatically enabled. If the minor version of the AnalyticDB for MySQL cluster is earlier than V3.2.0.0, you must execute the
SET
statement to enable the binary logging feature.To query the minor version of an AnalyticDB for MySQL Data Lakehouse Edition cluster, execute the
SELECT adb_version();
statement. To update the minor version of a cluster, contact technical support.For information about how to view and update the minor version of an AnalyticDB for MySQL Data Warehouse Edition cluster, see Update the minor version of a cluster.
SET ADB_CONFIG BINLOG_ENABLE=true;
Enable the binary logging feature for base tables.
ALTER TABLE <table_name> binlog=true;
ImportantYou cannot perform INSERT OVERWRITE INTO or TRUNCATE operations on tables for which the binary logging feature is enabled.
After you create a materialized view that supports incremental refresh, you cannot disable the binary logging feature for base tables.
After you delete a materialized view that supports incremental refresh, you can execute the
SET ADB_CONFIG BINLOG_ENABLE=false;
andALTER TABLE <table_name> binlog=false;
statements to disable the binary logging feature for the AnalyticDB for MySQL cluster and base tables.
Configure incremental refresh when you create a materialized view
CREATE [OR REPLACE] MATERIALIZED VIEW <mv_name>
[MV DEFINITION]
REFRESH FAST [ON DEMAND] [START WITH date] <NEXT date>
AS
<QUERY BODY>
Parameters
Parameter | Required | Description |
| No | The rule that is used to create a materialized view based on whether the name of the materialized view that you want to create is the same as the name of an existing materialized view.
|
| Yes | The name of the materialized view. |
| No | The table-related properties in the materialized view. A materialized view uses a standard table schema to store data. You can use this parameter to specify the primary key, partition key, and index for the materialized view. A primary key is automatically generated for a materialized view that supports incremental refresh. You can also specify a primary key. Primary keys are generated based on the following rules:
Note When you create a materialized view, you cannot specify columns that are not included in the query result. This rule is the same as the rule that is used when you create a table. |
| Yes | The incremental refresh method. |
| No | The on-demand trigger mode of refreshes. Incremental refresh is automatically triggered based on the specified |
| No | The first point in time when you want the materialized view to be automatically refreshed. If you do not specify this parameter, the current point in time is used. |
| Yes | The next point in time when you want the materialized view to be automatically refreshed. |
| Yes | The query body of the materialized view.
|
Configure incremental refresh for a single-table materialized view
Create a base table.
CREATE TABLE tbl0 (a bigint, b tinyint, c boolean, d decimal(15, 2), PRIMARY KEY(a)) DISTRIBUTED BY HASH (a);
Enable the binary logging feature.
SET ADB_CONFIG BINLOG_ENABLE=true;
ALTER TABLE tbl0 binlog=true;
Create a materialized view that supports incremental refresh.
Create a single-table materialized view that supports incremental refresh at a 10-second interval based on a single table. The materialized view does not involve aggregate operations.
CREATE MATERIALIZED VIEW mv0 REFRESH FAST NEXT now() + INTERVAL 10 second AS SELECT a, b, c # The system uses the primary key of the base table as the primary key of the materialized view. FROM tbl0 WHERE d > 1000;
Create a single-table materialized view that supports incremental refresh at a 5-second interval based on a single table. The materialized view involves the GROUP BY column and aggregate operations.
CREATE MATERIALIZED VIEW mv1 REFRESH FAST NEXT now() + INTERVAL 5 second AS SELECT b, c, # The system uses the GROUP BY columns as the primary key of the materialized view. COUNT(a) AS cnt_a, # You can perform an aggregate operation on a column. sum(d) AS sum_d, # You can perform an aggregate operation on a column. b / 100 AS new_b # You can use an expression to perform a non-aggregate operation on a column. FROM tbl0 WHERE ifnull(d, 1) > 0 # You can use expressions in the WHERE clause. GROUP BY b, c;
Create a materialized view that supports incremental refresh at a 1-minute interval based on a single table. The materialized view involves an aggregate operation without the GROUP BY column.
CREATE MATERIALIZED VIEW mv2 REFRESH FAST NEXT now() + INTERVAL 1 minute AS SELECT count(*) AS cnt # The system generates a constant as the primary key to ensure that only one record is contained in the materialized view. FROM tbl0;
Configure incremental refresh for a multi-table materialized view
Create base tables and enable the binary logging feature.
CREATE TABLE tbl0 ( a0 bigint, b0 varchar, c0 int, d0 varchar, e0 boolean, PRIMARY KEY(a0) ) DISTRIBUTED BY HASH (a0) BINLOG=true; CREATE TABLE tbl1 ( a1 bigint, b1 varchar, c1 int, d1 varchar, e1 boolean, PRIMARY KEY(a1) ) DISTRIBUTED BY HASH (a1) BINLOG=true; CREATE TABLE tbl2 ( a2 bigint, b2 varchar, c2 int, d2 varchar, e2 boolean, PRIMARY KEY(a2) ) DISTRIBUTED BY HASH (a2) BINLOG=true;
Create a materialized view that supports incremental refresh.
Create a multi-table materialized view that supports incremental refresh at a 5-second interval based on multiple tables. The materialized view does not involve aggregate operations.
CREATE MATERIALIZED VIEW mv3 REFRESH FAST NEXT now() + INTERVAL 5 second AS SELECT a0, a1, a2, (c0 + c1 + c2) AS c FROM tbl0 JOIN tbl1 ON b1 = b0 JOIN tbl2 ON b2 = b1;
Create a multi-table materialized view that supports incremental refresh at a 10-second interval based on multiple tables. The materialized view involves the GROUP BY column and aggregate operations.
CREATE MATERIALIZED VIEW mv4 REFRESH FAST NEXT now() + INTERVAL 10 second AS SELECT d0, d1, d2, COUNT(*) AS cnt, sum(c1) AS sum_c1, sum(c2) AS sum_c2 FROM tbl0 JOIN (SELECT b1, c1, d1 FROM tbl1) ON b1 = b0 JOIN (SELECT * FROM tbl2 WHERE e2 = true) ON b2 = b1 GROUP BY d0, d1, d2;